• Privacy Policy

Research Method

Home » Factor Analysis – Steps, Methods and Examples

Factor Analysis – Steps, Methods and Examples

Table of Contents

Factor Analysis

Factor Analysis

Definition:

Factor analysis is a statistical technique that is used to identify the underlying structure of a relatively large set of variables and to explain these variables in terms of a smaller number of common underlying factors. It helps to investigate the latent relationships between observed variables.

Factor Analysis Steps

Here are the general steps involved in conducting a factor analysis:

1. Define the Research Objective:

Clearly specify the purpose of the factor analysis. Determine what you aim to achieve or understand through the analysis.

2. Data Collection:

Gather the data on the variables of interest. These variables should be measurable and related to the research objective. Ensure that you have a sufficient sample size for reliable results.

3. Assess Data Suitability:

Examine the suitability of the data for factor analysis. Check for the following aspects:

  • Sample size: Ensure that you have an adequate sample size to perform factor analysis reliably.
  • Missing values: Handle missing data appropriately, either by imputation or exclusion.
  • Variable characteristics: Verify that the variables are continuous or at least ordinal in nature. Categorical variables may require different analysis techniques.
  • Linearity: Assess whether the relationships among variables are linear.

4. Determine the Factor Analysis Technique:

There are different types of factor analysis techniques available, such as exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). Choose the appropriate technique based on your research objective and the nature of the data.

5. Perform Factor Analysis:

   a. Exploratory Factor Analysis (EFA):

  • Extract factors: Use factor extraction methods (e.g., principal component analysis or common factor analysis) to identify the initial set of factors.
  • Determine the number of factors: Decide on the number of factors to retain based on statistical criteria (e.g., eigenvalues, scree plot) and theoretical considerations.
  • Rotate factors: Apply factor rotation techniques (e.g., varimax, oblique) to simplify the factor structure and make it more interpretable.
  • Interpret factors: Analyze the factor loadings (correlations between variables and factors) to interpret the meaning of each factor.
  • Determine factor reliability: Assess the internal consistency or reliability of the factors using measures like Cronbach’s alpha.
  • Report results: Document the factor loadings, rotated component matrix, communalities, and any other relevant information.

   b. Confirmatory Factor Analysis (CFA):

  • Formulate a theoretical model: Specify the hypothesized relationships among variables and factors based on prior knowledge or theoretical considerations.
  • Define measurement model: Establish how each variable is related to the underlying factors by assigning factor loadings in the model.
  • Test the model: Use statistical techniques like maximum likelihood estimation or structural equation modeling to assess the goodness-of-fit between the observed data and the hypothesized model.
  • Modify the model: If the initial model does not fit the data adequately, revise the model by adding or removing paths, allowing for correlated errors, or other modifications to improve model fit.
  • Report results: Present the final measurement model, parameter estimates, fit indices (e.g., chi-square, RMSEA, CFI), and any modifications made.

6. Interpret and Validate the Factors:

Once you have identified the factors, interpret them based on the factor loadings, theoretical understanding, and research objectives. Validate the factors by examining their relationships with external criteria or by conducting further analyses if necessary.

Types of Factor Analysis

Types of Factor Analysis are as follows:

Exploratory Factor Analysis (EFA)

EFA is used to explore the underlying structure of a set of observed variables without any preconceived assumptions about the number or nature of the factors. It aims to discover the number of factors and how the observed variables are related to those factors. EFA does not impose any restrictions on the factor structure and allows for cross-loadings of variables on multiple factors.

Confirmatory Factor Analysis (CFA)

CFA is used to test a pre-specified factor structure based on theoretical or conceptual assumptions. It aims to confirm whether the observed variables measure the latent factors as intended. CFA tests the fit of a hypothesized model and assesses how well the observed variables are associated with the expected factors. It is often used for validating measurement instruments or evaluating theoretical models.

Principal Component Analysis (PCA)

PCA is a dimensionality reduction technique that can be considered a form of factor analysis, although it has some differences. PCA aims to explain the maximum amount of variance in the observed variables using a smaller number of uncorrelated components. Unlike traditional factor analysis, PCA does not assume that the observed variables are caused by underlying factors but focuses solely on accounting for variance.

Common Factor Analysis

It assumes that the observed variables are influenced by common factors and unique factors (specific to each variable). It attempts to estimate the common factor structure by extracting the shared variance among the variables while also considering the unique variance of each variable.

Hierarchical Factor Analysis

Hierarchical factor analysis involves multiple levels of factors. It explores both higher-order and lower-order factors, aiming to capture the complex relationships among variables. Higher-order factors are based on the relationships among lower-order factors, which are in turn based on the relationships among observed variables.

Factor Analysis Formulas

Factor Analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors.

Here are some of the essential formulas and calculations used in factor analysis:

Correlation Matrix :

The first step in factor analysis is to create a correlation matrix, which calculates the correlation coefficients between pairs of variables.

Correlation coefficient (Pearson’s r) between variables X and Y is calculated as:

r(X,Y) = Σ[(xi – x̄)(yi – ȳ)] / [n-1] σx σy

where: xi, yi are the data points, x̄, ȳ are the means of X and Y respectively, σx, σy are the standard deviations of X and Y respectively, n is the number of data points.

Extraction of Factors :

The extraction of factors from the correlation matrix is typically done by methods such as Principal Component Analysis (PCA) or other similar methods.

The formula used in PCA to calculate the principal components (factors) involves finding the eigenvalues and eigenvectors of the correlation matrix.

Let’s denote the correlation matrix as R. If λ is an eigenvalue of R, and v is the corresponding eigenvector, they satisfy the equation: Rv = λv

Factor Loadings :

Factor loadings are the correlations between the original variables and the factors. They can be calculated as the eigenvectors normalized by the square roots of their corresponding eigenvalues.

Communality and Specific Variance :

Communality of a variable is the proportion of variance in that variable explained by the factors. It can be calculated as the sum of squared factor loadings for that variable across all factors.

The specific variance of a variable is the proportion of variance in that variable not explained by the factors, and it’s calculated as 1 – Communality.

Factor Rotation : Factor rotation, such as Varimax or Promax, is used to make the output more interpretable. It doesn’t change the underlying relationships but affects the loadings of the variables on the factors.

For example, in the Varimax rotation, the objective is to minimize the variance of the squared loadings of a factor (column) on all the variables (rows) in a factor matrix, which leads to more high and low loadings, making the factor easier to interpret.

Examples of Factor Analysis

Here are some real-time examples of factor analysis:

  • Psychological Research: In a study examining personality traits, researchers may use factor analysis to identify the underlying dimensions of personality by analyzing responses to various questionnaires or surveys. Factors such as extroversion, neuroticism, and conscientiousness can be derived from the analysis.
  • Market Research: In marketing, factor analysis can be used to understand consumers’ preferences and behaviors. For instance, by analyzing survey data related to product features, pricing, and brand perception, researchers can identify factors such as price sensitivity, brand loyalty, and product quality that influence consumer decision-making.
  • Finance and Economics: Factor analysis is widely used in portfolio management and asset pricing models. By analyzing historical market data, factors such as market returns, interest rates, inflation rates, and other economic indicators can be identified. These factors help in understanding and predicting investment returns and risk.
  • Social Sciences: Factor analysis is employed in social sciences to explore underlying constructs in complex datasets. For example, in education research, factor analysis can be used to identify dimensions such as academic achievement, socio-economic status, and parental involvement that contribute to student success.
  • Health Sciences: In medical research, factor analysis can be utilized to identify underlying factors related to health conditions, symptom clusters, or treatment outcomes. For instance, in a study on mental health, factor analysis can be used to identify underlying factors contributing to depression, anxiety, and stress.
  • Customer Satisfaction Surveys: Factor analysis can help businesses understand the key drivers of customer satisfaction. By analyzing survey responses related to various aspects of product or service experience, factors such as product quality, customer service, and pricing can be identified, enabling businesses to focus on areas that impact customer satisfaction the most.

Factor analysis in Research Example

Here’s an example of how factor analysis might be used in research:

Let’s say a psychologist is interested in the factors that contribute to overall wellbeing. They conduct a survey with 1000 participants, asking them to respond to 50 different questions relating to various aspects of their lives, including social relationships, physical health, mental health, job satisfaction, financial security, personal growth, and leisure activities.

Given the broad scope of these questions, the psychologist decides to use factor analysis to identify underlying factors that could explain the correlations among responses.

After conducting the factor analysis, the psychologist finds that the responses can be grouped into five factors:

  • Physical Wellbeing : Includes variables related to physical health, exercise, and diet.
  • Mental Wellbeing : Includes variables related to mental health, stress levels, and emotional balance.
  • Social Wellbeing : Includes variables related to social relationships, community involvement, and support from friends and family.
  • Professional Wellbeing : Includes variables related to job satisfaction, work-life balance, and career development.
  • Financial Wellbeing : Includes variables related to financial security, savings, and income.

By reducing the 50 individual questions to five underlying factors, the psychologist can more effectively analyze the data and draw conclusions about the major aspects of life that contribute to overall wellbeing.

In this way, factor analysis helps researchers understand complex relationships among many variables by grouping them into a smaller number of factors, simplifying the data analysis process, and facilitating the identification of patterns or structures within the data.

When to Use Factor Analysis

Here are some circumstances in which you might want to use factor analysis:

  • Data Reduction : If you have a large set of variables, you can use factor analysis to reduce them to a smaller set of factors. This helps in simplifying the data and making it easier to analyze.
  • Identification of Underlying Structures : Factor analysis can be used to identify underlying structures in a dataset that are not immediately apparent. This can help you understand complex relationships between variables.
  • Validation of Constructs : Factor analysis can be used to confirm whether a scale or measure truly reflects the construct it’s meant to measure. If all the items in a scale load highly on a single factor, that supports the construct validity of the scale.
  • Generating Hypotheses : By revealing the underlying structure of your variables, factor analysis can help to generate hypotheses for future research.
  • Survey Analysis : If you have a survey with many questions, factor analysis can help determine if there are underlying factors that explain response patterns.

Applications of Factor Analysis

Factor Analysis has a wide range of applications across various fields. Here are some of them:

  • Psychology : It’s often used in psychology to identify the underlying factors that explain different patterns of correlations among mental abilities. For instance, factor analysis has been used to identify personality traits (like the Big Five personality traits), intelligence structures (like Spearman’s g), or to validate the constructs of different psychological tests.
  • Market Research : In this field, factor analysis is used to identify the factors that influence purchasing behavior. By understanding these factors, businesses can tailor their products and marketing strategies to meet the needs of different customer groups.
  • Healthcare : In healthcare, factor analysis is used in a similar way to psychology, identifying underlying factors that might influence health outcomes. For instance, it could be used to identify lifestyle or behavioral factors that influence the risk of developing certain diseases.
  • Sociology : Sociologists use factor analysis to understand the structure of attitudes, beliefs, and behaviors in populations. For example, factor analysis might be used to understand the factors that contribute to social inequality.
  • Finance and Economics : In finance, factor analysis is used to identify the factors that drive financial markets or economic behavior. For instance, factor analysis can help understand the factors that influence stock prices or economic growth.
  • Education : In education, factor analysis is used to identify the factors that influence academic performance or attitudes towards learning. This could help in developing more effective teaching strategies.
  • Survey Analysis : Factor analysis is often used in survey research to reduce the number of items or to identify the underlying structure of the data.
  • Environment : In environmental studies, factor analysis can be used to identify the major sources of environmental pollution by analyzing the data on pollutants.

Advantages of Factor Analysis

Advantages of Factor Analysis are as follows:

  • Data Reduction : Factor analysis can simplify a large dataset by reducing the number of variables. This helps make the data easier to manage and analyze.
  • Structure Identification : It can identify underlying structures or patterns in a dataset that are not immediately apparent. This can provide insights into complex relationships between variables.
  • Construct Validation : Factor analysis can be used to validate whether a scale or measure accurately reflects the construct it’s intended to measure. This is important for ensuring the reliability and validity of measurement tools.
  • Hypothesis Generation : By revealing the underlying structure of your variables, factor analysis can help generate hypotheses for future research.
  • Versatility : Factor analysis can be used in various fields, including psychology, market research, healthcare, sociology, finance, education, and environmental studies.

Disadvantages of Factor Analysis

Disadvantages of Factor Analysis are as follows:

  • Subjectivity : The interpretation of the factors can sometimes be subjective, depending on how the data is perceived. Different researchers might interpret the factors differently, which can lead to different conclusions.
  • Assumptions : Factor analysis assumes that there’s some underlying structure in the dataset and that all variables are related. If these assumptions do not hold, factor analysis might not be the best tool for your analysis.
  • Large Sample Size Required : Factor analysis generally requires a large sample size to produce reliable results. This can be a limitation in studies where data collection is challenging or expensive.
  • Correlation, not Causation : Factor analysis identifies correlational relationships, not causal ones. It cannot prove that changes in one variable cause changes in another.
  • Complexity : The statistical concepts behind factor analysis can be difficult to understand and require expertise to implement correctly. Misuse or misunderstanding of the method can lead to incorrect conclusions.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Cluster Analysis

Cluster Analysis – Types, Methods and Examples

Discriminant Analysis

Discriminant Analysis – Methods, Types and...

MANOVA

MANOVA (Multivariate Analysis of Variance) –...

Documentary Analysis

Documentary Analysis – Methods, Applications and...

ANOVA

ANOVA (Analysis of variance) – Formulas, Types...

Graphical Methods

Graphical Methods – Types, Examples and Guide

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Int J Med Educ

Factor Analysis: a means for theory and instrument development in support of construct validity

Mohsen tavakol.

1 School of Medicine, Medical Education Centre, the University of Nottingham, UK

Angela Wetzel

2 School of Education, Virginia Commonwealth University, USA

Introduction

Factor analysis (FA) allows us to simplify a set of complex variables or items using statistical procedures to explore the underlying dimensions that explain the relationships between the multiple variables/items. For example, to explore inter-item relationships for a 20-item instrument, a basic analysis would produce 400 correlations; it is not an easy task to keep these matrices in our heads. FA simplifies a matrix of correlations so a researcher can more easily understand the relationship between items in a scale and the underlying factors that the items may have in common. FA is a commonly applied and widely promoted procedure for developing and refining clinical assessment instruments to produce evidence for the construct validity of the measure.

In the literature, the strong association between construct validity and FA is well documented, as the method provides evidence based on test content and evidence based on internal structure, key components of construct validity. 1 From FA, evidence based on internal structure and evidence based on test content can be examined to tell us what the instrument really measures - the intended abstract concept (i.e., a factor/dimension/construct) or something else. Establishing construct validity for the interpretations from a measure is critical to high quality assessment and subsequent research using outcomes data from the measure. Therefore, FA should be a researcher’s best friend during the development and validation of a new measure or when adapting a measure to a new population. FA is also a useful companion when critiquing existing measures for application in research or assessment practice. However, despite the popularity of FA, when applied in medical education instrument development, factor analytic procedures do not always match best practice. 2 This editorial article is designed to help medical educators use FA appropriately.

The Applications of FA

The applications of FA depend on the purpose of the research. Generally speaking, there are two most important types of FA: Explorator Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA).

Exploratory Factor Analysis

Exploratory Factor Analysis (EFA) is widely used in medical education research in the early phases of instrument development, specifically for measures of latent variables that cannot be assessed directly. Typically, in EFA, the researcher, through a review of the literature and engagement with content experts, selects as many instrument items as necessary to fully represent the latent construct (e.g., professionalism). Then, using EFA, the researcher explores the results of factor loadings, along with other criteria (e.g., previous theory, Minimum average partial, 3 Parallel analysis, 4 conceptual meaningfulness, etc.) to refine the measure. Suppose an instrument consisting of 30 questions yields two factors - Factor 1 and Factor 2. A good definition of a factor as a theoretical construct is to look at its factor loadings. 5 The factor loading is the correlation between the item and the factor; a factor loading of more than 0.30 usually indicates a moderate correlation between the item and the factor. Most statistical software, such as SAS, SPSS and R, provide factor loadings. Upon review of the items loading on each factor, the researcher identifies two distinct constructs, with items loading on Factor 1 all related to professionalism, and items loading on Factor 2 related, instead, to leadership. Here, EFA helps the researcher build evidence based on internal structure by retaining only those items with appropriately high loadings on Factor 1 for professionalism, the construct of interest.

It is important to note that, often, Principal Component Analysis (PCA) is applied and described, in error, as exploratory factor analysis. 2 , 6 PCA is appropriate if the study primarily aims to reduce the number of original items in the intended instrument to a smaller set. 7 However, if the instrument is being designed to measure a latent construct, EFA, using Maximum Likelihood (ML) or Principal Axis Factoring (PAF), is the appropriate method. 7   These exploratory procedures statistically analyze the interrelationships between the instrument items and domains to uncover the unknown underlying factorial structure (dimensions) of the construct of interest. PCA, by design, seeks to explain total variance (i.e., specific and error variance) in the correlation matrix. The sum of the squared loadings on a factor matrix for a particular item indicates the proportion of variance for that given item that is explained by the factors. This is called the communality. The higher the communality value, the more the extracted factors explain the variance of the item. Further, the mean score for the sum of the squared factor loadings specifies the proportion of variance explained by each factor. For example, assume four items of an instrument have produced Factor 1, factor loadings of Factor 1 are 0.86, 0.75, 0.66 and 0.58, respectively. If you square the factor loading of items, you will get the percentage of the variance of that item which is explained by Factor 1. In this example, the first principal component (PC) for item1, item2, item3 and item4 is 74%, 56%, 43% and 33%, respectively. If you sum the squared factor loadings of Factor 1, you will get the eigenvalue, which is 2.1 and dividing the eigenvalue by four (2.1/4= 0.52) we will get the proportion of variance accounted for Factor 1, which is 52 %. Since PCA does not separate specific variance and error variance, it often inflates factor loadings and limits the potential for the factor structure to be generalized and applied with other samples in subsequent study. On the other hand, Maximum likelihood and Principal Axis Factoring extraction methods separate common and unique variance (specific and error variance), which overcomes the issue attached to PCA.  Thus, the proportion of variance explained by an extracted factor more precisely reflects the extent to which the latent construct is measured by the instrument items. This focus on shared variance among items explained by the underlying factor, particularly during instrument development, helps the researcher understand the extent to which a measure captures the intended construct. It is useful to mention that in PAF, the initial communalities are not set at 1s, but they are chosen based on the squared multiple correlation coefficient. Indeed, if you run a multiple regression to predict say  item1 (dependent variable)  from other items (independent variables) and then look at the R-squared (R2), you will see R2 is equal to the communalities of item1 derived from PAF.

Confirmatory Factor Analysis

When prior EFA studies are available for your intended instrument, Confirmatory Factor Analysis extends on those findings, allowing you to confirm or disconfirm the underlying factor structures, or dimensions, extracted in prior research. CFA is a theory or model-driven approach that tests how well the data “fit” to the proposed model or theory. CFA thus departs from EFA in that researchers must first identify a factor model before analysing the data. More fundamentally, CFA is a means for statistically testing the internal structure of instruments and relies on the maximum likelihood estimation (MLE) and a different set of standards for assessing the suitability of the construct of interest. 7 , 8

Factor analysts usually use the path diagram to show the theoretical and hypothesized relationships between items and the factors to create a hypothetical model to test using the ML method. In the path diagram, circles or ovals represent factors. A rectangle represents the instrument items. Lines (→ or ↔) represent relationships between items. No line, no relationship. A single-headed arrow shows the causal relationship (the variable that the arrowhead refers to is the dependent variable), and a double-headed shows a covariance between variables or factors.

If CFA indicates the primary factors, or first-order factors, produced by the prior PAF are correlated, then the second-order factors need to be modelled and estimated to get a greater understanding of the data. It should be noted if the prior EFA applied an orthogonal rotation to the factor solution, the factors produced would be uncorrelated. Hence, the analysis of the second-order factors is not possible. Generally, in social science research, most constructs assume inter-related factors, and therefore should apply an oblique rotation. The justification for analyzing the second-order factors is that when the correlations between the primary factors exist, CFA can then statistically model a broad picture of factors not captured by the primary factors (i.e., the first-order factors). 9   The analysis of the first-order factors is like surveying mountains with a zoom lens binoculars, while the analysis of the second-order factors uses a wide-angle lens. 10 Goodness of- fit- tests need to be conducted when evaluating the hypothetical model tested by CFA. The question is: does the new data fit the hypothetical model? However, the statistical models of the goodness of- fit- tests are complex, and extend beyond the scope of this editorial paper; thus,we strongly encourage the readers consult with factors analysts to receive resources and possible advise.

Conclusions

Factor analysis methods can be incredibly useful tools for researchers attempting to establish high quality measures of those constructs not directly observed and captured by observation. Specifically, the factor solution derived from an Exploratory Factor Analysis provides a snapshot of the statistical relationships of the key behaviors, attitudes, and dispositions of the construct of interest. This snapshot provides critical evidence for the validity of the measure based on the fit of the test content to the theoretical framework that underlies the construct. Further, the relationships between factors, which can be explored with EFA and confirmed with CFA, help researchers interpret the theoretical connections between underlying dimensions of a construct and even extending to relationships across constructs in a broader theoretical model. However, studies that do not apply recommended extraction, rotation, and interpretation in FA risk drawing faulty conclusions about the validity of a measure. As measures are picked up by other researchers and applied in experimental designs, or by practitioners as assessments in practice, application of measures with subpar evidence for validity produces a ripple effect across the field. It is incumbent on researchers to ensure best practices are applied or engage with methodologists to support and consult where there are gaps in knowledge of methods. Further, it remains important to also critically evaluate measures selected for research and practice, focusing on those that demonstrate alignment with best practice for FA and instrument development. 7 , 11

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Employee Exit Interviews
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Market Research
  • Artificial Intelligence
  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Survey Data Analysis & Reporting
  • Factor Analysis

Try Qualtrics for free

Factor analysis and how it simplifies research findings.

17 min read There are many forms of data analysis used to report on and study survey data. Factor analysis is best when used to simplify complex data sets with many variables.

What is factor analysis?

Factor analysis is the practice of condensing many variables into just a few, so that your research data is easier to work with.

For example, a retail business trying to understand customer buying behaviours might consider variables such as ‘did the product meet your expectations?’, ‘how would you rate the value for money?’ and ‘did you find the product easily?’. Factor analysis can help condense these variables into a single factor, such as ‘customer purchase satisfaction’.

customer purchase satisfaction tree

The theory is that there are deeper factors driving the underlying concepts in your data, and that you can uncover and work with them instead of dealing with the lower-level variables that cascade from them. Know that these deeper concepts aren’t necessarily immediately obvious – they might represent traits or tendencies that are hard to measure, such as extraversion or IQ.

Factor analysis is also sometimes called “dimension reduction”: you can reduce the “dimensions” of your data into one or more “super-variables,” also known as unobserved variables or latent variables. This process involves creating a factor model and often yields a factor matrix that organizes the relationship between observed variables and the factors they’re associated with.

As with any kind of process that simplifies complexity, there is a trade-off between the accuracy of the data and how easy it is to work with. With factor analysis, the best solution is the one that yields a simplification that represents the true nature of your data, with minimum loss of precision. This often means finding a balance between achieving the variance explained by the model and using fewer factors to keep the model simple.

Factor analysis isn’t a single technique, but a family of statistical methods that can be used to identify the latent factors driving observable variables. Factor analysis is commonly used in market research , as well as other disciplines like technology, medicine, sociology, field biology, education, psychology and many more.

What is a factor?

In the context of factor analysis, a factor is a hidden or underlying variable that we infer from a set of directly measurable variables.

Take ‘customer purchase satisfaction’ as an example again. This isn’t a variable you can directly ask a customer to rate, but it can be determined from the responses to correlated questions like ‘did the product meet your expectations?’, ‘how would you rate the value for money?’ and ‘did you find the product easily?’.

While not directly observable, factors are essential for providing a clearer, more streamlined understanding of data. They enable us to capture the essence of our data’s complexity, making it simpler and more manageable to work with, and without losing lots of information.

Free eBook: 2024 global market research trends report

Key concepts in factor analysis

These concepts are the foundational pillars that guide the application and interpretation of factor analysis.

Central to factor analysis, variance measures how much numerical values differ from the average. In factor analysis, you’re essentially trying to understand how underlying factors influence this variance among your variables. Some factors will explain more variance than others, meaning they more accurately represent the variables they consist of.

The eigenvalue expresses the amount of variance a factor explains. If a factor solution (unobserved or latent variables) has an eigenvalue of 1 or above, it indicates that a factor explains more variance than a single observed variable, which can be useful in reducing the number of variables in your analysis. Factors with eigenvalues less than 1 account for less variability than a single variable and are generally not included in the analysis.

Factor score

A factor score is a numeric representation that tells us how strongly each variable from the original data is related to a specific factor. Also called the component score, it can help determine which variables are most influenced by each factor and are most important for each underlying concept.

Factor loading

Factor loading is the correlation coefficient for the variable and factor. Like the factor score, factor loadings give an indication of how much of the variance in an observed variable can be explained by the factor. High factor loadings (close to 1 or -1) mean the factor strongly influences the variable.

When to use factor analysis

Factor analysis is a powerful tool when you want to simplify complex data, find hidden patterns, and set the stage for deeper, more focused analysis.

It’s typically used when you’re dealing with a large number of interconnected variables, and you want to understand the underlying structure or patterns within this data. It’s particularly useful when you suspect that these observed variables could be influenced by some hidden factors.

For example, consider a business that has collected extensive customer feedback through surveys. The survey covers a wide range of questions about product quality, pricing, customer service and more. This huge volume of data can be overwhelming, and this is where factor analysis comes in. It can help condense these numerous variables into a few meaningful factors, such as ‘product satisfaction’, ‘customer service experience’ and ‘value for money’.

Factor analysis doesn’t operate in isolation – it’s often used as a stepping stone for further analysis. For example, once you’ve identified key factors through factor analysis, you might then proceed to a cluster analysis – a method that groups your customers based on their responses to these factors. The result is a clearer understanding of different customer segments, which can then guide targeted marketing and product development strategies.

By combining factor analysis with other methodologies, you can not only make sense of your data but also gain valuable insights to drive your business decisions.

Factor analysis assumptions

Factor analysis relies on several assumptions for accurate results. Violating these assumptions may lead to factors that are hard to interpret or misleading.

Linear relationships between variables

This ensures that changes in the values of your variables are consistent.

Sufficient variables for each factor

Because if only a few variables represent a factor, it might not be identified accurately.

Adequate sample size

The larger the ratio of cases (respondents, for instance) to variables, the more reliable the analysis.

No perfect multicollinearity and singularity

No variable is a perfect linear combination of other variables, and no variable is a duplicate of another.

Relevance of the variables

There should be some correlation between variables to make a factor analysis feasible.

assumptions for factor analysis

Types of factor analysis

There are two main factor analysis methods: exploratory and confirmatory. Here’s how they are used to add value to your research process.

Confirmatory factor analysis

In this type of analysis, the researcher starts out with a hypothesis about their data that they are looking to prove or disprove. Factor analysis will confirm – or not – where the latent variables are and how much variance they account for.

Principal component analysis (PCA) is a popular form of confirmatory factor analysis. Using this method, the researcher will run the analysis to obtain multiple possible solutions that split their data among a number of factors. Items that load onto a single particular factor are more strongly related to one another and can be grouped together by the researcher using their conceptual knowledge or pre-existing research.

Using PCA will generate a range of solutions with different numbers of factors, from simplified 1-factor solutions to higher levels of complexity. However, the fewer number of factors employed, the less variance will be accounted for in the solution.

Exploratory factor analysis

As the name suggests, exploratory factor analysis is undertaken without a hypothesis in mind. It’s an investigatory process that helps researchers understand whether associations exist between the initial variables, and if so, where they lie and how they are grouped.

How to perform factor analysis: A step-by-step guide

Performing a factor analysis involves a series of steps, often facilitated by statistical software packages like SPSS, Stata and the R programming language . Here’s a simplified overview of the process.

how to perform factor analysis

Prepare your data

Start with a dataset where each row represents a case (for example, a survey respondent), and each column is a variable you’re interested in. Ensure your data meets the assumptions necessary for factor analysis.

Create an initial hypothesis

If you have a theory about the underlying factors and their relationships with your variables, make a note of this. This hypothesis can guide your analysis, but keep in mind that the beauty of factor analysis is its ability to uncover unexpected relationships.

Choose the type of factor analysis

The most common type is exploratory factor analysis, which is used when you’re not sure what to expect. If you have a specific hypothesis about the factors, you might use confirmatory factor analysis.

Form your correlation matrix

After you’ve chosen the type of factor analysis, you’ll need to create the correlation matrix of your variables. This matrix, which shows the correlation coefficients between each pair of variables, forms the basis for the extraction of factors. This is a key step in building your factor analysis model.

Decide on the extraction method

Principal component analysis is the most commonly used extraction method. If you believe your factors are correlated, you might opt for principal axis factoring, a type of factor analysis that identifies factors based on shared variance.

Determine the number of factors

Various criteria can be used here, such as Kaiser’s criterion (eigenvalues greater than 1), the scree plot method or parallel analysis. The choice depends on your data and your goals.

Interpret and validate your results

Each factor will be associated with a set of your original variables, so label each factor based on how you interpret these associations. These labels should represent the underlying concept that ties the associated variables together.

Validation can be done through a variety of methods, like splitting your data in half and checking if both halves produce the same factors.

How factor analysis can help you

As well as giving you fewer variables to navigate, factor analysis can help you understand grouping and clustering in your input variables, since they’ll be grouped according to the latent variables.

Say you ask several questions all designed to explore different, but closely related, aspects of customer satisfaction:

  • How satisfied are you with our product?
  • Would you recommend our product to a friend or family member?
  • How likely are you to purchase our product in the future?

But you only want one variable to represent a customer satisfaction score. One option would be to average the three question responses. Another option would be to create a factor dependent variable. This can be done by running a principal component analysis (PCA) and keeping the first principal component (also known as a factor). The advantage of a PCA over an average is that it automatically weights each of the variables in the calculation.

Say you have a list of questions and you don’t know exactly which responses will move together and which will move differently; for example, purchase barriers of potential customers. The following are possible barriers to purchase:

  • Price is prohibitive
  • Overall implementation costs
  • We can’t reach a consensus in our organization
  • Product is not consistent with our business strategy
  • I need to develop an ROI, but cannot or have not
  • We are locked into a contract with another product
  • The product benefits don’t outweigh the cost
  • We have no reason to switch
  • Our IT department cannot support your product
  • We do not have sufficient technical resources
  • Your product does not have a feature we require
  • Other (please specify)

Factor analysis can uncover the trends of how these questions will move together. The following are loadings for 3 factors for each of the variables.

factor analysis data

Notice how each of the principal components have high weights for a subset of the variables. Weight is used interchangeably with loading, and high weight indicates the variables that are most influential for each principal component. +0.30 is generally considered to be a heavy weight.

The first component displays heavy weights for variables related to cost, the second weights variables related to IT, and the third weights variables related to organizational factors. We can give our new super variables clever names.

factor analysis data 2

If we were to cluster the customers based on these three components, we can see some trends. Customers tend to be high in cost barriers or organizational barriers, but not both.

The red dots represent respondents who indicated they had higher organizational barriers; the green dots represent respondents who indicated they had higher cost barriers

factor analysis graph

Considerations when using factor analysis

Factor analysis is a tool, and like any tool its effectiveness depends on how you use it. When employing factor analysis, it’s essential to keep a few key considerations in mind.

Oversimplification

While factor analysis is great for simplifying complex data sets, there’s a risk of oversimplification when grouping variables into factors. To avoid this you should ensure the reduced factors still accurately represent the complexities of your variables.

Subjectivity

Interpreting the factors can sometimes be subjective, and requires a good understanding of the variables and the context. Be mindful that multiple analysts may come up with different names for the same factor.

Supplementary techniques

Factor analysis is often just the first step. Consider how it fits into your broader research strategy and which other techniques you’ll use alongside it.

Examples of factor analysis studies

Factor analysis, including PCA, is often used in tandem with segmentation studies. It might be an intermediary step to reduce variables before using KMeans to make the segments.

Factor analysis provides simplicity after reducing variables. For long studies with large blocks of Matrix Likert scale questions, the number of variables can become unwieldy. Simplifying the data using factor analysis helps analysts focus and clarify the results, while also reducing the number of dimensions they’re clustering on.

Sample questions for factor analysis

Choosing exactly which questions to perform factor analysis on is both an art and a science. Choosing which variables to reduce takes some experimentation, patience and creativity. Factor analysis works well on Likert scale questions and Sum to 100 question types.

Factor analysis works well on matrix blocks of the following question genres:

Psychographics (Agree/Disagree):

  • I value family
  • I believe brand represents value

Behavioral (Agree/Disagree):

  • I purchase the cheapest option
  • I am a bargain shopper

Attitudinal (Agree/Disagree):

  • The economy is not improving
  • I am pleased with the product

Activity-Based (Agree/Disagree):

  • I love sports
  • I sometimes shop online during work hours

Behavioral and psychographic questions are especially suited for factor analysis.

Sample output reports

Factor analysis simply produces weights (called loadings) for each respondent. These loadings can be used like other responses in the survey.

Related resources

Analysis & Reporting

Margin of error 11 min read

Data saturation in qualitative research 8 min read, thematic analysis 11 min read, behavioral analytics 12 min read, statistical significance calculator: tool & complete guide 18 min read, regression analysis 19 min read, data analysis 31 min read, request demo.

Ready to learn more about Qualtrics?

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

Lesson 12: factor analysis, overview section  .

Factor Analysis is a method for modeling observed variables, and their covariance structure, in terms of a smaller number of underlying unobservable (latent) “factors.” The factors typically are viewed as broad concepts or ideas that may describe an observed phenomenon. For example, a basic desire of obtaining a certain social level might explain most consumption behavior. These unobserved factors are more interesting to the social scientist than the observed quantitative measurements.

Factor analysis is generally an exploratory/descriptive method that requires many subjective judgments. It is a widely used tool and often controversial because the models, methods, and subjectivity are so flexible that debates about interpretations can occur.

The method is similar to principal components although, as the textbook points out, factor analysis is more elaborate. In one sense, factor analysis is an inversion of principal components. In factor analysis, we model the observed variables as linear functions of the “factors.” In principal components, we create new variables that are linear combinations of the observed variables.  In both PCA and FA, the dimension of the data is reduced. Recall that in PCA, the interpretation of the principal components is often not very clean. A particular variable may, on occasion, contribute significantly to more than one of the components. Ideally, we like each variable to contribute significantly to only one component. A technique called factor rotation is employed toward that goal. Examples of fields where factor analysis is involved include physiology, health, intelligence, sociology, and sometimes ecology among others.

  • Understand the terminology of factor analysis, including the interpretation of factor loadings, specific variances, and commonalities;
  • Understand how to apply both principal component and maximum likelihood methods for estimating the parameters of a factor model;
  • Understand factor rotation, and interpret rotated factor loadings.
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Institute for Digital Research and Education

A Practical Introduction to Factor Analysis

Factor analysis is a method for modeling observed variables and their covariance structure in terms of unobserved variables (i.e., factors). There are two types of factor analyses, exploratory and confirmatory. Exploratory factor analysis (EFA) is method to explore the underlying structure of a set of observed variables, and is a crucial step in the scale development process. The first step in EFA is factor extraction. During this seminar, we will discuss how principal components analysis and common factor analysis differ in their approach to variance partitioning. Common factor analysis models can be estimated using various estimation methods such as principal axis factoring and maximum likelihood, and we will compare the practical differences between these two methods. After extracting the best factor structure, we can obtain a more interpretable factor solution through factor rotation. Here is where we will discuss the difference between orthogonal and oblique rotations, and finally how to the final solution to generate a factor score. For the latter portion of the seminar we will introduce confirmatory factor analysis (CFA), which is a method to verify a factor structure that has already been defined. Topics to discuss include identification, model fit, and degrees of freedom demonstrated through a three-item, two-item and eight-item one factor CFA and a two-factor CFA. SPSS will be used for the EFA portion of the seminar and R (lavaan) will be used for the CFA portion. You can click on the main link to access each portion of the seminar.

I. Exploratory Factor Analysis (EFA)

  • Motivating example: The SAQ
  • Pearson correlation formula
  • Partitioning the variance in factor analysis
  • principal components analysis
  • principal axis factoring
  • maximum likelihood
  • Simple Structure
  • Orthogonal rotation (Varimax)
  • Oblique (Direct Oblimin)
  • Generating factor scores

II. Confirmatory Factor Analysis (CFA)

  • Motivating example SPSS Anxiety Questionairre
  • The factor analysis model
  • The model-implied covariance matrix
  • The path diagram
  • Known values, parameters, and degrees of freedom
  • Three-item (one) factor analysis
  • Identification of a three-item one factor CFA
  • Running a one-factor CFA in lavaan
  • (Optional) How to manually obtain the standardized solution
  • (Optional) Degrees of freedom with means
  • One factor CFA with two items
  • One factor CFA with more than three items (SAQ-8)
  • Model chi-square
  • A note on sample size
  • (Optional) Model test of the baseline or null model
  • Incremental versus absolute fit index
  • CFI (Confirmatory Factor Index)
  • TLI (Tucker Lewis Index)
  • Uncorrelated factors
  • Correlated factors
  • Second-Order CFA
  • (Optional) Warning message with second-order CFA
  • (Optional) Obtaining the parameter table

DiStefano, C., Zhu, M., & Mindrila, D. (2009). Understanding and using factor scores: Considerations for the applied researcher. Practical Assessment, Research & Evaluation , 14 (20), 2.

Field, A. (2009). Discovering statistics using SPSS . Sage publications.

Geiser, C. (2012). Data analysis with Mplus . Guilford Press.

Lawley, D. N., Maxwell, A. E. (1971). Factor Analysis as a Statistical Method.  Second Edition. American Elsevier Publishing Company, Inc. New York.

Pett, M. A., Lackey, N. R., & Sullivan, J. J. (2003). Making sense of factor analysis: The use of factor analysis for instrument development in health care research . Sage.

Your Name (required)

Your Email (must be a valid email for us to receive the report!)

Comment/Error Report (required)

How to cite this page

  • © 2021 UC REGENTS
  • 1-800-609-6480

Alchemer

  • Your Audience
  • Your Industry
  • Customer Stories
  • Case Studies
  • Alchemer Survey
  • Alchemer Survey is the industry leader in flexibility, ease of use, and fastest implementation. Learn More
  • Alchemer Workflow
  • Alchemer Workflow is the fastest, easiest, and most effective way to close the loop with customers. Learn More
  • Alchemer Digital
  • Alchemer Digital drives omni-channel customer engagement across Mobile and Web digital properties. Learn More
  • Additional Products
  • Alchemer Mobile
  • Alchemer Web
  • Email and SMS Distribution
  • Integrations
  • Panel Services
  • Website Intercept
  • Onboarding Services
  • Business Labs
  • Basic Training
  • Alchemer University
  • Our full-service team will help you find the audience you need. Learn More
  • Professional Services
  • Specialists will custom-fit Alchemer Survey and Workflow to your business. Learn More
  • Mobile Executive Reports
  • Get help gaining insights into mobile customer feedback tailored to your requirements. Learn More
  • Self-Service Survey Pricing
  • News & Press
  • Help Center
  • Mobile Developer Guides
  • Resource Library
  • Close the Loop
  • Security & Privacy

Factor Analysis 101: The Basics

  • Share this post:

What is Factor Analysis?

Factor analysis is a powerful data reduction technique that enables researchers to investigate concepts that cannot easily be measured directly. By boiling down a large number of variables into a handful of comprehensible underlying factors, factor analysis results in easy-to-understand, actionable data. 

By applying this method to your research, you can spot trends faster and see themes throughout your datasets, enabling you to learn what the data points have in common. 

Unlike statistical methods such as regression analysis , factor analysis does not require defined variables. 

Factor analysis is most commonly used to identify the relationship between all of the variables included in a given dataset.

The Objectives of Factor Analysis

 Think of factor analysis as shrink wrap. When applied to a large amount of data, it compresses the set into a smaller set that is far more manageable, and easier to understand. 

The overall objective of factor analysis can be broken down into four smaller objectives: 

  • To definitively understand how many factors are needed to explain common themes amongst a given set of variables.
  • To determine the extent to which each variable in the dataset is associated with a common theme or factor.
  • To provide an interpretation of the common factors in the dataset.
  • To determine the degree to which each observed data point represents each theme or factor.

When to Use Factor Analysis

Determining when to use particular statistical methods to get the most insight out of your data can be tricky.

When considering factor analysis, have your goal top-of-mind.

There are three main forms of factor analysis. If your goal aligns to any of these forms, then you should choose factor analysis as your statistical method of choice: 

Exploratory Factor Analysi s should be used when you need to develop a hypothesis about a relationship between variables. 

Confirmatory Factor Analysis should be used to test a hypothesis about the relationship between variables.

Construct Validity should be used to test the degree to which your survey actually measures what it is intended to measure.

How To Ensure Your Survey is Optimized for Factor Analysis

If you know that you’ll want to perform a factor analysis on response data from a survey, there are a few things you can do ahead of time to ensure that your analysis will be straightforward, informative, and actionable.

Identify and Target Enough Respondents

Large datasets are the lifeblood of factor analysis. You’ll need large groups of survey respondents, often found through panel services , for factor analysis to yield significant results. 

While variables such as population size and your topic of interest will influence how many respondents you need, it’s best to maintain a “more respondents the better” mindset. 

The More Questions, The Better

While designing your survey , load in as many specific questions as possible. Factor analysis will fall flat if your survey only has a few broad questions.  

The ultimate goal of factor analysis is to take a broad concept and simplify it by considering more granular, contextual information, so this approach will provide you the results you’re looking for. 

Aim for Quantitative Data

If you’re looking to perform a factor analysis, you’ll want to avoid having open-ended survey questions . 

By providing answer options in the form of scales (whether they be Likert Scales , numerical scales, or even ‘yes/no’ scales) you’ll save yourself a world of trouble when you begin conducting your factor analysis. Just make sure that you’re using the same scaled answer options as often as possible.

  • Get Your Free Demo Today Get Demo
  • See How Easy Alchemer Is to Use See Help Docs

Start making smarter decisions

Related posts, introducing alchemer workflow – the fastest, easiest, most effective way to close the loop with customers and employees.

  • February 14, 2023
  • 3 minute read

Alchemer Acquires Apptentive, Market-Leading Mobile Feedback Platform

  • January 4, 2023

How In-app Customer Feedback Helps Drive Revenue and Inform Your Product 

  • April 11, 2024
  • 5 minute read

Don’t let State and Federal Regulations Crush Your IT Department 

  • April 8, 2024
  • 4 minute read

Don’t Let Unknown Data Siloes Put Your Entire School District at Risk

  • April 1, 2024

See it in Action

Request a demo.

factor analysis in research methodology

  • Privacy Overview
  • Strictly Necessary Cookies
  • 3rd Party Cookies

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!

9 Factor Analysis Overview

You can load this file with open_template_file(“factoranalysis”) .

Background: https://usq.pressbooks.pub/statisticsforresearchstudents/part/factor-analysis/ https://advstats.psychstat.org/book/factor/efa.php for an overview, though there are many introductions that cover the same general territory.

Sage: Kim, J., & Mueller, C. W. (1978). Factor analysis. SAGE Publications, Inc., https://doi.org/10.4135/9781412984256

Finch, W. (2020). Exploratory factor analysis. SAGE Publications, Inc., https://doi.org/10.4135/9781544339900

We have seen that the scales in the ethics1 data set are at list minimally internally reliable, but the weakest ones are DS_OD and DS_ID .

ggplot ( data = ethics1, aes ( x= DS_OD, y= DS_ID)) + geom_jitter () + geom_smooth ( method = "lm" , se = FALSE )

image

Clearly they are positively associated, but there are some high values

for both scales that are possibly very influential and making

the line seem steeper than it might be if they weren’t there.

Remember that DS_ID was constructed from 7 items and DS_OD used 12 items. If you look at the ethics vignette you can see the specific items.

Looking at the correlation matrix for the original 19 variables we can see that these vary quite a bit.

# This creates a data frame with just our variables of interest. ethics1[ complete.cases (ethics1) == TRUE ,] |> select ( starts_with ( "DS_ID_" ) | starts_with ( "DS_OD_" )) |> haven :: zap_labels () -> DS_data # This creates a correlation matrix for those variables. DS_data |> cor () -> DS_cor # Rounding will make it easier to read. round (DS_cor, 2 )

## DS_ID_1 DS_ID_2 DS_ID_3 DS_ID_4 DS_ID_5 DS_ID_6 DS_ID_7 DS_OD_1 ## DS_ID_1 1.00 0.47 0.43 0.49 0.42 0.46 0.28 0.22 ## DS_ID_2 0.47 1.00 0.57 0.46 0.51 0.62 0.44 0.31 ## DS_ID_3 0.43 0.57 1.00 0.43 0.33 0.46 0.20 0.34 ## DS_ID_4 0.49 0.46 0.43 1.00 0.47 0.38 0.17 0.24 ## DS_ID_5 0.42 0.51 0.33 0.47 1.00 0.20 0.52 0.16 ## DS_ID_6 0.46 0.62 0.46 0.38 0.20 1.00 0.43 0.35 ## DS_ID_7 0.28 0.44 0.20 0.17 0.52 0.43 1.00 0.28 ## DS_OD_1 0.22 0.31 0.34 0.24 0.16 0.35 0.28 1.00 ## DS_OD_2 0.33 0.32 0.23 0.22 0.20 0.21 0.23 0.27 ## DS_OD_3 0.06 0.14 0.43 0.06 0.06 0.15 0.13 0.39 ## DS_OD_4 0.43 0.36 0.30 0.28 0.23 0.42 0.25 0.46 ## DS_OD_5 0.31 0.22 0.27 0.24 0.16 0.29 0.25 0.28 ## DS_OD_6 0.30 0.37 0.30 0.28 0.35 0.30 0.38 0.32 ## DS_OD_7 0.34 0.57 0.45 0.32 0.24 0.50 0.31 0.29 ## DS_OD_8 0.38 0.43 0.26 0.18 0.16 0.38 0.34 0.40 ## DS_OD_9 0.35 0.43 0.42 0.29 0.15 0.51 0.31 0.55 ## DS_OD_10 0.13 0.10 0.13 0.08 0.09 0.10 0.14 0.50 ## DS_OD_11 0.36 0.47 0.25 0.18 0.20 0.37 0.29 0.39 ## DS_OD_12 0.28 0.36 0.37 0.34 0.26 0.38 0.37 0.67 ## DS_OD_2 DS_OD_3 DS_OD_4 DS_OD_5 DS_OD_6 DS_OD_7 DS_OD_8 DS_OD_9 ## DS_ID_1 0.33 0.06 0.43 0.31 0.30 0.34 0.38 0.35 ## DS_ID_2 0.32 0.14 0.36 0.22 0.37 0.57 0.43 0.43 ## DS_ID_3 0.23 0.43 0.30 0.27 0.30 0.45 0.26 0.42 ## DS_ID_4 0.22 0.06 0.28 0.24 0.28 0.32 0.18 0.29 ## DS_ID_5 0.20 0.06 0.23 0.16 0.35 0.24 0.16 0.15 ## DS_ID_6 0.21 0.15 0.42 0.29 0.30 0.50 0.38 0.51 ## DS_ID_7 0.23 0.13 0.25 0.25 0.38 0.31 0.34 0.31 ## DS_OD_1 0.27 0.39 0.46 0.28 0.32 0.29 0.40 0.55 ## DS_OD_2 1.00 0.16 0.60 0.40 0.41 0.38 0.58 0.31 ## DS_OD_3 0.16 1.00 0.08 0.22 0.34 0.37 0.22 0.19 ## DS_OD_4 0.60 0.08 1.00 0.46 0.38 0.34 0.60 0.51 ## DS_OD_5 0.40 0.22 0.46 1.00 0.41 0.40 0.41 0.42 ## DS_OD_6 0.41 0.34 0.38 0.41 1.00 0.57 0.54 0.48 ## DS_OD_7 0.38 0.37 0.34 0.40 0.57 1.00 0.52 0.52 ## DS_OD_8 0.58 0.22 0.60 0.41 0.54 0.52 1.00 0.42 ## DS_OD_9 0.31 0.19 0.51 0.42 0.48 0.52 0.42 1.00 ## DS_OD_10 0.19 0.25 0.22 0.28 0.11 0.10 0.29 0.43 ## DS_OD_11 0.48 0.12 0.46 0.43 0.53 0.50 0.62 0.46 ## DS_OD_12 0.38 0.29 0.38 0.38 0.44 0.41 0.39 0.67 ## DS_OD_10 DS_OD_11 DS_OD_12 ## DS_ID_1 0.13 0.36 0.28 ## DS_ID_2 0.10 0.47 0.36 ## DS_ID_3 0.13 0.25 0.37 ## DS_ID_4 0.08 0.18 0.34 ## DS_ID_5 0.09 0.20 0.26 ## DS_ID_6 0.10 0.37 0.38 ## DS_ID_7 0.14 0.29 0.37 ## DS_OD_1 0.50 0.39 0.67 ## DS_OD_2 0.19 0.48 0.38 ## DS_OD_3 0.25 0.12 0.29 ## DS_OD_4 0.22 0.46 0.38 ## DS_OD_5 0.28 0.43 0.38 ## DS_OD_6 0.11 0.53 0.44 ## DS_OD_7 0.10 0.50 0.41 ## DS_OD_8 0.29 0.62 0.39 ## DS_OD_9 0.43 0.46 0.67 ## DS_OD_10 1.00 0.38 0.60 ## DS_OD_11 0.38 1.00 0.46 ## DS_OD_12 0.60 0.46 1.00

A visualization can be helpful for seeing patterns.

#notice that the figure size is adjusted. corrplot :: corrplot (DS_cor, method = 'color' )

image

corrplot :: corrplot (DS_cor, order = 'AOE' )

image

corrplot :: corrplot (DS_cor, order = ‘hclust’ , addrect = 5 )

image

There are many other options for displaying the correlation matrix, which you can see here: https://cran.r-project.org/web/packages/corrplot/vignettes/corrplot-intro.html

How many factors are there?

ev <- eigen (DS_cor) # get eigenvalues ev $ values

## [1] 7.3216335 1.8875037 1.4382953 1.1918764 1.0513941 0.9402367 0.7320710 ## [8] 0.6899644 0.6261405 0.5065023 0.4623007 0.4337202 0.3475235 0.3164977 ## [15] 0.2865786 0.2289359 0.2207224 0.1679963 0.1501070

psych :: scree (DS_data)

image

psych :: fa.parallel (DS_data, fa = "fa" )

image

## Parallel analysis suggests that the number of factors = 6 and the number of components = NA

In this case it seems like there are actually 5 factors, not two.

DS_data |> factanal ( factors = 6 , scores = "Bartlett" ) -> fa5 fa5

## ## Call: ## factanal(x = DS_data, factors = 6, scores = "Bartlett") ## ## Uniquenesses: ## DS_ID_1 DS_ID_2 DS_ID_3 DS_ID_4 DS_ID_5 DS_ID_6 DS_ID_7 DS_OD_1 ## 0.591 0.288 0.430 0.609 0.005 0.371 0.624 0.379 ## DS_OD_2 DS_OD_3 DS_OD_4 DS_OD_5 DS_OD_6 DS_OD_7 DS_OD_8 DS_OD_9 ## 0.497 0.005 0.005 0.659 0.439 0.346 0.337 0.346 ## DS_OD_10 DS_OD_11 DS_OD_12 ## 0.469 0.392 0.173 ## ## Loadings: ## Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 ## DS_ID_1 0.248 0.459 0.301 0.202 ## DS_ID_2 0.320 0.695 0.350 ## DS_ID_3 0.610 0.156 0.186 0.349 ## DS_ID_4 0.474 0.135 0.367 ## DS_ID_5 0.202 0.970 ## DS_ID_6 0.241 0.730 0.151 0.118 ## DS_ID_7 0.271 0.256 0.177 0.452 ## DS_OD_1 0.187 0.226 0.659 0.213 0.228 ## DS_OD_2 0.587 0.155 0.113 0.338 ## DS_OD_3 0.153 0.207 0.959 ## DS_OD_4 0.445 0.246 0.210 0.826 ## DS_OD_5 0.449 0.155 0.244 0.204 ## DS_OD_6 0.622 0.186 0.181 0.258 0.199 ## DS_OD_7 0.576 0.496 0.102 0.224 ## DS_OD_8 0.735 0.195 0.171 0.225 ## DS_OD_9 0.337 0.448 0.559 0.163 ## DS_OD_10 0.152 0.702 ## DS_OD_11 0.684 0.222 0.279 ## DS_OD_12 0.278 0.256 0.809 0.161 ## ## Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 ## SS loadings 3.057 2.717 2.321 1.673 1.212 1.054 ## Proportion Var 0.161 0.143 0.122 0.088 0.064 0.055 ## Cumulative Var 0.161 0.304 0.426 0.514 0.578 0.633 ## ## Test of the hypothesis that 6 factors are sufficient. ## The chi square statistic is 230.2 on 72 degrees of freedom. ## The p-value is 1.96e-18

For contrast, we can also see what the 2-factor solution looks like.

DS_data |> factanal ( factors = 2 ) -> fa2 fa2

## ## Call: ## factanal(x = DS_data, factors = 2) ## ## Uniquenesses: ## DS_ID_1 DS_ID_2 DS_ID_3 DS_ID_4 DS_ID_5 DS_ID_6 DS_ID_7 DS_OD_1 ## 0.605 0.342 0.616 0.695 0.715 0.521 0.728 0.436 ## DS_OD_2 DS_OD_3 DS_OD_4 DS_OD_5 DS_OD_6 DS_OD_7 DS_OD_8 DS_OD_9 ## 0.702 0.862 0.597 0.709 0.599 0.494 0.559 0.408 ## DS_OD_10 DS_OD_11 DS_OD_12 ## 0.474 0.551 0.289 ## ## Loadings: ## Factor1 Factor2 ## DS_ID_1 0.607 0.161 ## DS_ID_2 0.795 0.161 ## DS_ID_3 0.574 0.233 ## DS_ID_4 0.536 0.133 ## DS_ID_5 0.531 ## DS_ID_6 0.648 0.244 ## DS_ID_7 0.465 0.235 ## DS_OD_1 0.207 0.722 ## DS_OD_2 0.402 0.370 ## DS_OD_3 0.154 0.338 ## DS_OD_4 0.459 0.439 ## DS_OD_5 0.337 0.421 ## DS_OD_6 0.504 0.383 ## DS_OD_7 0.641 0.309 ## DS_OD_8 0.491 0.447 ## DS_OD_9 0.408 0.652 ## DS_OD_10 0.722 ## DS_OD_11 0.458 0.489 ## DS_OD_12 0.272 0.798 ## ## Factor1 Factor2 ## SS loadings 4.450 3.649 ## Proportion Var 0.234 0.192 ## Cumulative Var 0.234 0.426 ## ## Test of the hypothesis that 2 factors are sufficient. ## The chi square statistic is 721.19 on 134 degrees of freedom. ## The p-value is 3.24e-81

We use rotations to simplify the representation of the factors.

There are many options, let’s use promax as an example.

DS_data |> factanal ( factors = 5 , scores = "Bartlett" , rotation = "promax" ) -> fa5p print (fa5p, cut= 0.2 )

## ## Call: ## factanal(x = DS_data, factors = 5, scores = "Bartlett", rotation = "promax") ## ## Uniquenesses: ## DS_ID_1 DS_ID_2 DS_ID_3 DS_ID_4 DS_ID_5 DS_ID_6 DS_ID_7 DS_OD_1 ## 0.557 0.314 0.553 0.621 0.005 0.330 0.633 0.409 ## DS_OD_2 DS_OD_3 DS_OD_4 DS_OD_5 DS_OD_6 DS_OD_7 DS_OD_8 DS_OD_9 ## 0.461 0.750 0.248 0.654 0.403 0.272 0.338 0.364 ## DS_OD_10 DS_OD_11 DS_OD_12 ## 0.464 0.473 0.193 ## ## Loadings: ## Factor1 Factor2 Factor3 Factor4 Factor5 ## DS_ID_1 0.294 0.431 ## DS_ID_2 0.655 0.216 ## DS_ID_3 0.578 ## DS_ID_4 0.421 0.281 ## DS_ID_5 1.074 ## DS_ID_6 0.895 -0.247 ## DS_ID_7 0.412 ## DS_OD_1 0.729 ## DS_OD_2 0.764 ## DS_OD_3 0.240 0.452 ## DS_OD_4 0.895 -0.290 ## DS_OD_5 0.411 ## DS_OD_6 0.250 0.590 ## DS_OD_7 0.356 0.736 ## DS_OD_8 0.701 0.280 ## DS_OD_9 0.499 0.349 ## DS_OD_10 0.862 -0.244 ## DS_OD_11 0.443 0.307 ## DS_OD_12 0.871 ## ## Factor1 Factor2 Factor3 Factor4 Factor5 ## SS loadings 2.478 2.441 2.330 1.628 1.544 ## Proportion Var 0.130 0.128 0.123 0.086 0.081 ## Cumulative Var 0.130 0.259 0.382 0.467 0.548 ## ## Factor Correlations: ## Factor1 Factor2 Factor3 Factor4 Factor5 ## Factor1 1.000 0.367 -0.271 -0.568 -0.346 ## Factor2 0.367 1.000 -0.571 -0.543 -0.559 ## Factor3 -0.271 -0.571 1.000 0.476 0.543 ## Factor4 -0.568 -0.543 0.476 1.000 0.470 ## Factor5 -0.346 -0.559 0.543 0.470 1.000 ## ## Test of the hypothesis that 5 factors are sufficient.

## The chi square statistic is 323.4 on 86 degrees of freedom. ## The p-value is 3.33e-29

One of the interesting things to notice is that DS_OD_3 is not by itself the way it was when we looked at the bivariate correlation matrix. This is also true for some of the other rectangles that were highlighted. This is because factor analysis is a multivariate method, controlling for many variables at once. When you do that the bivariate relationships can change.

One of the decisions about rotations is whether the factors should be allowed to be correlated or must be uncorrelated (orthogonal) with each other. The promax rotation allows them to be correlated. To see the implications of this we can compare the correlations of the factors created in fa5 and fa5p.

round ( cor (fa5 $ scores), 2 )

## Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 ## Factor1 1.00 -0.11 -0.07 0.01 -0.02 -0.09 ## Factor2 -0.11 1.00 -0.04 -0.04 0.00 0.01 ## Factor3 -0.07 -0.04 1.00 0.01 -0.02 0.00 ## Factor4 0.01 -0.04 0.01 1.00 0.00 0.00 ## Factor5 -0.02 0.00 -0.02 0.00 1.00 0.02 ## Factor6 -0.09 0.01 0.00 0.00 0.02 1.00

round ( cor (fa5p $ scores), 2 )

## Factor1 Factor2 Factor3 Factor4 Factor5 ## Factor1 1.00 0.50 0.46 0.35 0.46 ## Factor2 0.50 1.00 0.41 0.25 0.47 ## Factor3 0.46 0.41 1.00 0.54 0.34 ## Factor4 0.35 0.25 0.54 1.00 0.31 ## Factor5 0.46 0.47 0.34 0.31 1.00

From a criminology perspective we would probably expect that deviance of different types would be correlated, so it is likely we would use the rotated scores.

We can use these scores for further analysis or we could use summary scales with Cronbach’s α the way we did previously.

Research Methods for Lehman EdD Copyright © by elinwaring. All Rights Reserved.

Share This Book

A Primer on Factor Analysis in Research using Reproducible R Software

Abdisalam hassan muse (phd).

Amoud University

This primer provides an overview of factor analysis in research, covering the meaning and assumptions of factor analysis, as well as the differences between exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). The procedure for conducting factor analysis is explained, with a focus on the role of the correlation matrix and a general model of the correlation matrix of individual variables. The paper covers methods for extracting factors, including principal component analysis (PCA) and determining the number of factors to be extracted, such as the comprehensibility, Kaiser Criterion, variance explained criteria, Cattell’s scree plot, and Horn’s parallel analysis (PA). The meaning and interpretation of communality and eigenvalues are discussed, as well as factor loading and rotation methods such as varimax. The paper also covers the meaning and interpretation of factor scores and their use in subsequent analyses. The R software is used throughout the paper to provide reproducible examples and code for conducting factor analysis.

Introduction

Factor analysis is a statistical technique commonly used in research to identify underlying dimensions or constructs that explain the variability among a set of observed variables. It is often used to reduce the complexity of a dataset by summarizing a large number of variables into a smaller set of factors that are easier to understand and analyze. Factor analysis is widely used in fields such as psychology, education, marketing, and social sciences to explore the relationships between variables and to identify underlying latent constructs.

In this tutorial paper, we will provide an overview of factor analysis, including its meaning and assumptions, the differences between exploratory factor analysis (EFA) and confirmatory factor analysis (CFA), and the procedure for conducting factor analysis. We will also cover the role of the correlation matrix and a general model of the correlation matrix of individual variables.

The paper will discuss methods for extracting factors, including principal component analysis (PCA), and determining the number of factors to be extracted using criteria such as the comprehensibility, Kaiser Criterion, variance explained criteria, Cattell’s scree plot, and Horn’s parallel analysis (PA). The meaning and interpretation of communality and eigenvalues will be discussed, as well as factor loading and rotation methods such as varimax.

Finally, the paper will cover the meaning and interpretation of factor scores and their use in subsequent analyses. The R software will be used throughout the paper to provide reproducible examples and code for conducting factor analysis. By the end of this tutorial paper, readers will have a better understanding of the fundamentals of factor analysis and how to apply it in their research.

Module I: Factor Analysis in Research

Meaning of factor analysis.

Factor analysis is a statistical method that is widely used in research to identify the underlying factors that explain the variations in a set of observed variables. The method is particularly useful in fields such as psychology, sociology, marketing, and education, where researchers often deal with complex datasets that contain many variables. The basic idea behind factor analysis is to identify the common factors that underlie a set of observed variables. By identifying these factors, researchers can reduce the number of variables they need to analyze, simplify the data, and gain insights into the underlying structure of the data.

Factor analysis can be used in two main ways: exploratory and confirmatory.

Exploratory factor analysis is used when the researcher does not have a priori knowledge of the underlying factors and wants to identify them from the data.

Confirmatory factor analysis , on the other hand, is used when the researcher has a specific hypothesis about the underlying factors and wants to test this hypothesis using the data.

Factor analysis has several advantages over other statistical methods. It can help researchers identify the most important variables in a dataset, reduce the number of variables they need to analyze, and provide insights into the relationships between variables. However, it also has some limitations and assumptions that must be taken into account when applying the method.

In this primer or tutorial paper, we will provide an overview of factor analysis, its applications in research, and the steps involved in performing factor analysis. We will also discuss the assumptions and limitations of the method, as well as methods for interpreting and visualizing the results. Finally, we will provide several examples of factor analysis in different fields of research, illustrating how the method can be used to extract meaningful information from complex datasets.

Assumptions of Factor Analysis

Factor analysis is a statistical technique that is used to identify the underlying factors that explain the correlations between a set of observed variables. In order to obtain valid results from factor analysis, certain assumptions must be met. Here are some of the key assumptions of factor analysis:

Normality : Factor analysis assumes that the data is normally distributed. If the data is not normally distributed, then the results of the analysis may be biased or unreliable. Normality can be checked using statistical tests such as the Shapiro-Wilk test or the Kolmogorov-Smirnov test.

Linearity : Factor analysis assumes that the relationships between the observed variables and the underlying factors are linear. If the relationships are non-linear, then the results of the analysis may be biased or unreliable.

Sample size : Factor analysis assumes that the sample size is sufficient to obtain reliable estimates of the factor model. A rule of thumb is to have at least 10 observations per variable, although some researchers recommend a larger sample size.

Absence of multicollinearity : Factor analysis assumes that there is no multicollinearity among the observed variables. Multicollinearity occurs when two or more variables are highly correlated with each other, which can lead to unstable estimates of the factor model.

Adequate factor loading : Factor analysis assumes that there are strong associations (i.e., factor loadings) between the observed variables and the underlying factors. Weak factor loadings may indicate that the observed variables are not good indicators of the underlying factors, or that there are too few factors in the model.

In summary, factor analysis is a powerful technique for identifying the underlying factors that explain the correlations between a set of observed variables. However, the assumptions of normality, linearity, sample size, absence of multicollinearity, and adequate factor loading must be met in order to obtain valid results.

EFA and CFA Factor Analysis Procedures

Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA) are two types of factor analysis procedures that are used to identify the underlying factors that explain the correlations between a set of observed variables.

Exploratory Factor Analysis (EFA) is used when there is no prior theory about the underlying factors, and the goal is to identify the factors that explain the correlations between variables. In EFA, the researcher starts with a set of observed variables and uses statistical techniques to identify the most important factors that explain the correlations between them. The researcher does not have any preconceived notion about the number of factors or how they are related to each other. The aim is to identify the underlying structure of the data and to reduce the number of variables that need to be analyzed.

Confirmatory Factor Analysis (CFA), on the other hand, is used when there is a specific theory about the underlying factors, and the goal is to test this theory using the data. In CFA, the researcher starts with a pre-specified model that specifies the number of factors and how they are related to each other. The aim is to test the theory and to determine whether the observed data fit the model. The researcher tests the model using a variety of statistical techniques and evaluates the goodness-of-fit of the model.

EFA is an unsupervised learning method that is used to explore the data and identify the underlying structure. The goal of EFA is to identify the most important factors that explain the correlations between variables and to reduce the number of variables that need to be analyzed. The researcher does not have any preconceived notion about the number of factors or how they are related to each other. EFA involves several steps, such as selecting the appropriate method for factor extraction, determining the number of factors to retain, and selecting the method for factor rotation. The goal is to identify the simplest factor structure that best explains the data.

CFA, on the other hand, is a supervised learning method that is used to confirm or refute a pre-specified theory. The goal of CFA is to evaluate the degree to which the observed data fit the pre-specified model that specifies the proposed number of factors and their relationships. The researcher first creates a model that specifies the proposed number of factors and their relationships, and then tests the fit of the model to the observed data. The researcher can use a variety of statistical techniques to evaluate the goodness-of-fit of the model, such as chi-square tests, comparative fit index (CFI), Tucker-Lewis Index (TLI), and root mean square error of approximation (RMSEA).

Both EFA and CFA require the researcher to consider several assumptions of factor analysis, such as normality, linearity, absence of multicollinearity, and adequate factor loading. Violations of these assumptions can result in biased or unreliable results. Therefore, it is important to conduct appropriate data screening, model testing, and model modification to ensure that the assumptions are met.

In summary, EFA is an exploratory technique used to identify the underlying factors that explain the correlations between variables, while CFA is a confirmatory technique used to test a pre-specified theory about the underlying factors and their relationships. Both procedures require careful consideration of the assumptions of factor analysis and appropriate statistical techniques for model evaluation.

Comparison between EFA and CFA

Note that EFA and CFA are both types of factor analysis, but they differ in their goals, assumptions, and methods. EFA is used to explore the underlying structure of a set of observed variables, while CFA is used to test a specific model of the relationships between the observed variables and latent factors. EFA allows the number of factors to be determined by the data, while CFA requires the number of factors to be specified a priori. EFA allows the factor loadings to vary across different samples or variables, while CFA assumes that the factor loadings are fixed and known a priori. EFA is exploratory and can be used to generate hypotheses, while CFA is confirmatory and can be used to test specific hypotheses.

Real-life Examples

Some real-life examples and research titles that illustrate the differences between exploratory factor analysis (EFA) and confirmatory factor analysis (CFA):

In summary, EFA and CFA are both useful techniques for analyzing data and exploring the underlying structure of observed variables. The choice between EFA and CFA depends on the research question, the goals of the analysis, and the availability of prior knowledge or theoretical frameworks. EFA is useful for exploratory analyses where the underlying structure of the observed variables is unknown or needs to be better understood, while CFA is useful for confirmatory analyses where a pre-specified model of the relationships between the observed variables and latent constructs is available or needs to be tested.

1. Education sector:

Real-life example : A researcher is interested in understanding the factors that influence student engagement in online learning. They collect data on various variables such as perceived usefulness, ease of use, and social presence.

Research title for EFA : “Exploring the underlying factors of student engagement in online learning: An exploratory factor analysis approach”

Research title for CFA : “Testing a model of student engagement in online learning: A confirmatory factor analysis approach”

Explanation : In this example, the researcher may use EFA to explore the underlying structure of the observed variables related to student engagement in online learning and generate hypotheses about the relationships between the observed variables and latent factors. Alternatively, the researcher may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to student engagement in online learning. In this case, the research title would reflect the approach used in the analysis.

2. Psychology sector:

Real-life example : A psychologist is interested in understanding the factors that contribute to anxiety in adolescents. They collect data on various variables such as stress, self-esteem, and social support.

Research title for EFA : “Identifying the underlying factors of anxiety in adolescents: An exploratory factor analysis approach”

Research title for CFA : “Testing a model of anxiety in adolescents: A confirmatory factor analysis approach”

Explanation : In this example, the psychologist may use EFA to identify the underlying structure of the observed variables related to anxiety in adolescents and generate hypotheses about the relationships between the observed variables and latent factors. Alternatively, the psychologist may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to anxiety in adolescents. In this case, the research title would reflect the approach used in the analysis.

3. Law sector:

Real-life example : A law firm is interested in understanding the factors that contribute to job satisfaction among their employees. They collect data on various variables such as work-life balance, compensation, and career advancement opportunities.

Research title for EFA : “Exploring the underlying factors of job satisfaction among law firm employees: An exploratory factor analysis approach”

Research title for CFA : “Testing a model of job satisfaction among law firm employees: A confirmatory factor analysis approach”

Explanation : In this example, the law firm may use EFA to explore the underlying structure of the observed variables related to job satisfaction among their employees and generate hypotheses about the relationships between the observed variables and latent factors. Alternatively, the law firm may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to job satisfaction among their employees. In this case, the research title would reflect the approach used in the analysis.

4. Medicine sector:

Real-life example : A physician is interested in understanding the factors that contribute to patient satisfaction with their healthcare experience. They collect data on various variables such as communication with healthcare providers, access to care, and quality of care.

Research title for EFA : “Identifying the underlying factors of patient satisfaction with healthcare: An exploratory factor analysis approach”

Research title for CFA : “Testing a model of patient satisfaction with healthcare: A confirmatory factor analysis approach”

Explanation : In this example, the physician may use EFA to identify the underlying structure of the observed variables related to patient satisfaction with healthcare and generate hypotheses about the relationships between the observed variables and latent factors. Alternatively, the physician may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to patient satisfaction with healthcare. In this case, the research title would reflect the approach used in the analysis.

5. Engineering sector:

Real-life example : A company is interested in understanding the factors that contribute to customer satisfaction with their products. They collect data on various variables such as product quality, design, and reliability.

Research title for EFA : “Exploring the underlying factors of customer satisfaction with engineering products: An exploratory factor analysis approach”

Research title for CFA : “Developing and validating a model of customer satisfaction with engineering products: A confirmatory factor analysis approach”

Explanation : In this example, the company may use EFA to explore the underlying structure of the observed variables related to customer satisfaction with their engineering products and generate hypotheses about the relationships between the observed variables and latent factors. They may then use the results of the EFA to develop a new customer satisfaction survey. Alternatively, the company may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to customer satisfaction with their engineering products, and validate the new customer satisfaction survey. In this case, the research title would reflect the approach used in the analysis.

6. Public health sector:

Real-life example : A public health researcher is interested in understanding the factors that contribute to health-related quality of life among older adults. They collect data on various variables such as physical functioning, mental health, and social support.

Research title for EFA : “Exploring the underlying factors of health-related quality of life among older adults: An exploratory factor analysis approach”

Research title for CFA : “Testing a model of health-related quality of life among older adults: A confirmatory factor analysis approach”

Explanation : In this example, the public health researcher may use EFA to explore the underlying structure of the observed variables related to health-related quality of life among older adults and generate hypotheses about the relationships between the observed variables and latent factors. Alternatively, the researcher may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to health-related quality of life among older adults. In this case, the research title would reflect the approach used in the analysis.

7. Finance sector:

Real-life example : A financial analyst is interested in understanding the factors that contribute to stock prices. They collect data on various variables such as earnings per share, market capitalization, and price-earnings ratio.

Research title for EFA : “Exploring the underlying factors of stock prices: An exploratory factor analysis approach”

Research title for CFA : “Testing a model of stock prices: A confirmatory factor analysis approach”

Explanation : In this example, the financial analyst may use EFA to explore the underlying structure of the observed variables related to stock prices and generate hypotheses about the relationships between the observed variables and latent factors. Alternatively, the analyst may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to stock prices. In this case, the research title would reflect the approach used in the analysis.

8. Project Management sector:

Real-life example : A project manager is interested in understanding the factors that contribute to project success. They collect data on various variables such as project scope, budget, and stakeholder engagement.

Research title for EFA : “Exploring the underlying factors of project success: An exploratory factor analysis approach”

Research title for CFA : “Testing a model of project success: A confirmatory factor analysis approach”

Explanation : In this example, the project manager may use EFA to explore the underlying structure of the observed variables related to project success and generate hypotheses about the relationships between the observed variables and latent factors. Alternatively, the manager may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to project success. In this case, the research title would reflect the approach used in the analysis.

9. Monitoring and Evaluation (M&E) sector:

Real-life example : An M&E specialist is interested in understanding the factors that contribute to program effectiveness. They collect data on various variables such as program inputs, activities, and outcomes.

Research title for EFA : “Identifying the underlying factors of program effectiveness: An exploratory factor analysis approach”

Research title for CFA : “Testing a model of program effectiveness: A confirmatory factor analysis approach”

Explanation : In this example, the M&E specialist may use EFA to identify the underlying structure of the observed variables related to program effectiveness and generate hypotheses about the relationships between the observed variables and latent factors. Alternatively, the specialist may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to program effectiveness. In this case, the research title would reflect the approach used in the analysis.

10. Data Science sector:

Real-life example : A data scientist is interested in understanding the factors that contribute to customer churn. They collect data on various variables such as customer demographics, usage patterns, and customer service interactions.

Research title for EFA : “Exploring the underlying factors of customer churn: An exploratory factor analysis approach”

Research title for CFA : “Testing a model of customer churn: A confirmatory factor analysis approach”

Explanation : In this example, the data scientist may use EFA to explore the underlying structure of the observed variables related to customer churn and generate hypotheses about the relationships between the observed variables and latent factors. Alternatively, the scientist may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to customer churn. In this case, the research title would reflect the approach used in the analysis.

Overall, the choice between EFA and CFA depends on the research question, the goals of the analysis, and the availability of prior knowledge or theoretical frameworks, regardless of the sector in which the research is being conducted. EFA is useful for exploratory analyses where the underlying structure of the observed variables is unknown or needs to be better understood, while CFA is useful for confirmatory analyses where a pre-specified model of the relationships between the observed variables and latent constructs is available or needs to be tested.

Module II: Correlation Matrix

Role of a correlation matrix in factor analysis.

The correlation matrix is a critical component in factor analysis as it provides information about the relationships between the observed variables. In factor analysis, the goal is to identify the underlying factors that explain the correlations between the observed variables. The correlation matrix provides the information needed to identify the factors.

Factor analysis assumes that the observed variables are correlated because they share common underlying factors. The correlation matrix provides information about the strength and direction of these correlations. The strength of the correlation between two variables indicates how closely they are related, while the sign of the correlation (positive or negative) indicates the direction of the relationship. A positive correlation indicates that the variables tend to increase or decrease together, while a negative correlation indicates that the variables tend to move in opposite directions. Factor analysis uses the correlation matrix to estimate the factor loadings, which represent the degree to which each observed variable is associated with each underlying factor. The factor loadings are used to construct the factor structure, which represents the underlying factors and their relationships. The factor structure can be rotated to simplify and clarify the interpretation of the factors.

In summary, the correlation matrix is a key component in factor analysis as it provides the information needed to identify the underlying factors that explain the correlations between the observed variables. The factor loadings are estimated using the correlation matrix, and the factor structure is constructed based on the estimated loadings. The correlation matrix, therefore, plays a critical role in the factor analysis process.

A general model of a correlation matrix of individual variables

A correlation matrix is a square matrix that shows the correlation coefficients between a set of individual variables. The general model of a correlation matrix can be expressed as follows:

\[\begin{equation} C = \begin{bmatrix} c_{1,1} & c_{1,2} & c_{1,3} & \cdots & c_{1,k} \ c_{2,1} & c_{2,2} & c_{2,3} & \cdots & c_{2,k} \ c_{3,1} & c_{3,2} & c_{3,3} & \cdots & c_{3,k} \ \vdots & \vdots & \vdots & \ddots & \vdots \ c_{k,1} & c_{k,2} & c_{k,3} & \cdots & c_{k,k} \end{bmatrix} \end{equation}\] ,2} & c_{3,3} & & c_{3,k} & & & & c_{k,1} & c_{k,2} & c_{k,3} & & c_{k,k} \end{bmatrix} \end{equation}

where \(C\) is the correlation matrix, \(k\) is the number of individual variables, and \(c_{i,j}\) is the correlation coefficient between the \(i\) -th and \(j\) -th variables.

The diagonal elements of the correlation matrix represent the correlations between each variable and itself, which are always equal to 1. The off-diagonal elements represent the correlations between different pairs of variables. The correlation coefficient can range from -1 to 1, where a value of -1 indicates a perfect negative correlation, 0 indicates no correlation, and 1 indicates a perfect positive correlation.

The correlation matrix can be used for various purposes, such as identifying clusters of correlated variables, detecting multicollinearity, and exploring the underlying factor structure using factor analysis. It is important to note that the correlation matrix assumes that the variables are continuous, linearly related, and normally distributed. Violations of these assumptions can affect the validity and reliability of the correlation matrix and its interpretation.

Interpreting the correlation matrix

Interpreting a correlation matrix involves examining the strength and direction of the correlations between pairs of variables. The correlation matrix provides a summary of the relationships between the variables, and understanding these relationships is important for many statistical analyses, including regression, factor analysis, and structural equation modeling.

The strength of the correlation is indicated by the absolute value of the correlation coefficient. A correlation coefficient of 0 indicates no relationship between the variables, while a correlation coefficient of 1 (or -1) indicates a perfect positive (or negative) correlation. Correlation coefficients between 0 and 1 (or 0 and -1) indicate varying degrees of positive (or negative) correlation.

The direction of the correlation is indicated by the sign of the correlation coefficient. A positive correlation indicates that the variables tend to increase or decrease together, while a negative correlation indicates that the variables tend to move in opposite directions.

It is also important to consider the context of the variables being analyzed when interpreting the correlation matrix. For example, a correlation of 0.3 between two variables may be considered strong in one context and weak in another context.

Additionally, the correlation matrix does not imply causation, and caution should be exercised when interpreting correlations as evidence of causation. In some cases, it may be necessary to adjust the correlation matrix before interpreting it. For example, if the variables have different scales or units of measurement, it may be necessary to standardize the variables before calculating the correlation coefficients. Additionally, outliers or missing data may need to be addressed before interpreting the correlation matrix.

In summary, interpreting the correlation matrix involves examining the strength and direction of the correlations between pairs of variables and considering the context of the variables being analyzed. It is important to remember that the correlation matrix does not imply causation and that adjustments may be necessary before interpreting the matrix.

Example using R Code

R code for computing a correlation matrix and interpreting the results using a built-in dataset in R:

In this example, we loaded the built-in iris dataset in R and computed the correlation matrix between the variables Sepal.Length, Sepal.Width, Petal.Length, and Petal.Width using the cor() function. We then printed the correlation matrix and interpreted the results by examining the pairwise correlations between the variables.

Module III: Factors Extraction

Factor extraction methods are used in factor analysis to identify the underlying factors that explain the correlations between a set of observed variables. There are several methods for extracting factors, including:

Principal Component Analysis (PCA) : PCA is a data reduction technique that extracts factors based on the variance in the observed variables. PCA identifies factors that account for the maximum amount of variance in the data and is useful when the goal is to reduce the number of variables in the analysis.

Common Factor Analysis (CFA) : CFA is a method that extracts factors based on the shared variance among the observed variables. CFA assumes that the observed variables are influenced by a smaller number of common factors, which are responsible for the correlations among the variables.

Maximum Likelihood (ML) : ML is a statistical technique that estimates the parameters of a statistical model by maximizing the likelihood function. ML is commonly used in CFA and Structural Equation Modeling (SEM) to estimate the factor loadings and other model parameters.

Principal Axis Factoring (PAF) : PAF is a method that extracts factors based on the common variance among the observed variables. PAF assumes that each variable contributes to the factor structure in proportion to its common variance with the other variables.

Unweighted Least Squares (ULS) : ULS is a method that extracts factors based on the correlations among the observed variables. ULS is commonly used in CFA and SEM to estimate the factor loadings and other model parameters.

Maximum Variance (MV) : MV is a method that extracts factors based on the maximum variance in the observed variables. MV is similar to PCA but is less commonly used in factor analysis.

The choice of factor extraction method depends on the research question, the nature and structure of the data, and the assumptions underlying the method. It is important to carefully consider the strengths and limitations of each method and to select a method that is appropriate for the research question and the data at hand. ## Example 1

How to demonstrate the different factor extraction methods using the “bfi” dataset in the “psych” package in R:

R code that demonstrates different factor extraction methods using the bfi data in the psych package:

In this example, we are using the “bfi” dataset from the “psych” package and selecting a subset of variables to use for the factor analysis. We then use six different factor extraction methods, including Principal Component Analysis (PCA), Common Factor Analysis (CFA), Maximum Likelihood Estimation (MLE), Principal Axis Factoring (PAF), Unweighted Least Squares (ULS), and Maximum Variance Extraction (MVE).

For each method, we specify two factors and use the varimax rotation method to simplify the factor structure. We then print out the results for each method using the “print” function.

Note that the “fm” argument in the “fa” function specifies the factor extraction method to use. In this example, we are using “ml” for MLE, “paf” for PAF, “uls” for ULS, and “mve” for MVE. If no “fm” argument is specified, the default factor extraction method in the “fa” function is MLE.

The interpretation of the factor analysis results depends on the specific method used and the research question. In general, the summary output provides information about the factor loadings, communalities, eigenvalues, and other relevant statistics.

The factor diagram and biplot can help visualize the relationships between the variables and the factors.

It is important to carefully examine the results and to consider the assumptions of each method before interpreting the factor analysis results.

Determining the number of factors to be extracted

Determining the number of factors to be extracted in a factor analysis is an important step that involves evaluating the fit of the model and selecting the appropriate number of factors. There are several methods for determining the number of factors, including:

Comprehensibility : This method involves selecting the number of factors that make the most sense conceptually or theoretically. For example, if the research question involves identifying the underlying dimensions of a personality test, the number of factors may be based on the number of personality traits that are hypothesized to exist.

Kaiser Criterion : This method involves selecting the number of factors with eigenvalues greater than 1.0, which is based on the assumption that each factor should account for at least as much variance as one of the original variables. However, this method may overestimate the number of factors, particularly when there are many variables in the analysis.

Variance Explained Criteria : This method involves selecting the number of factors that explain a certain percentage of the total variance in the data. For example, a researcher may decide to retain factors that collectively explain at least 60% or 70% of the variance in the data.

Cattell’s Scree Plot : This method involves plotting the eigenvalues of the factors in descending order and selecting the number of factors at the “elbow” of the plot, which represents the point at which the eigenvalues start to level off. However, this method can be subjective and may be influenced by the researcher’s interpretation of the plot.

Horn’s Parallel Analysis : This method involves comparing the eigenvalues of the factors in the actual data to the eigenvalues of factors in randomly generated data with the same sample size and number of variables. The number of factors to retain is based on the eigenvalues of the actual data that exceed the eigenvalues of the randomly generated data. This method is considered to be one of the most accurate methods for determining the number of factors.

In summary, determining the number of factors to be extracted involves evaluating the fit of the model and selecting the appropriate number of factors based on a combination of methods, including comprehensibility, Kaiser criterion, variance explained criteria, Cattell’s scree plot, and Horn’s parallel analysis. It is important to carefully consider the strengths and limitations of each method and to select a method that is appropriate for the research question and the data at hand.

Real-life Example 1

Suppose a researcher is interested in identifying the underlying factors that explain the responses to a questionnaire about job satisfaction. The questionnaire includes 20 items that measure various aspects of job satisfaction, such as salary, work environment, and work-life balance.

Comprehensibility : The researcher may start by considering the theoretical or conceptual structure of job satisfaction. For example, if previous research has identified three dimensions of job satisfaction (i.e., intrinsic, extrinsic, and relational), the researcher may decide to extract three factors.

Kaiser Criterion : The researcher may perform a factor analysis and examine the eigenvalues of the factors. If the first three factors have eigenvalues greater than 1.0, the researcher may decide to extract three factors.

Variance Explained Criteria : The researcher may decide to extract the number of factors that explain a certain percentage of the total variance. For example, the researcher may decide to extract the number of factors that collectively explain at least 60% or 70% of the variance in the data.

Cattell’s Scree Plot : The researcher may plot the eigenvalues of the factors in descending order and select the number of factors at the “elbow” of the plot. For example, if the eigenvalues start to level off after the third factor, the researcher may decide to extract three factors.

Horn’s Parallel Analysis : The researcher may compare the eigenvalues of the factors in the actual data to the eigenvalues of factors in randomly generated data with the same sample size and number of variables. If the eigenvalues of the actual data exceed the eigenvalues of the randomly generated data for the first three factors, the researcher may decide to extract three factors.

In this example, the different methods for determining the number of factors may lead to different results. Comprehensibility and previous research suggest that there may be three factors, while the Kaiser criterion, variance explained criteria, and Cattell’s scree plot suggest that three factors may be appropriate. Horn’s parallel analysis may also support the extraction of three factors.

The choice of which method to use ultimately depends on the research question and the nature of the data. In some cases, a combination of methods may be used to determine the appropriate number of factors. For example, the researcher may consider both the theoretical structure of job satisfaction and the results of the factor analysis to decide on the appropriate number of factors to extract.

Real-life Example 2

Suppose a researcher is interested in identifying the underlying factors that explain the responses to a survey on customer satisfaction for a retail store. The survey includes 25 items that measure various aspects of customer satisfaction, such as product quality, store ambiance, customer service, and pricing.

Comprehensibility : The researcher may consider the theoretical or conceptual structure of customer satisfaction based on previous research. For example, if previous research has identified four dimensions of customer satisfaction (i.e., product quality, store ambiance, customer service, and pricing), the researcher may decide to extract four factors.

Kaiser Criterion : The researcher may perform a factor analysis and examine the eigenvalues of the factors. If the first four factors have eigenvalues greater than 1.0, the researcher may decide to extract four factors.

Variance Explained Criteria : The researcher may decide to extract the number of factors that explain a certain percentage of the total variance. For example, the researcher may decide to extract the number of factors that collectively explain at least 70% or 80% of the variance in the data.

Cattell’s Scree Plot : The researcher may plot the eigenvalues of the factors in descending order and select the number of factors at the “elbow” of the plot. For example, if the eigenvalues start to level off after the fourth factor, the researcher may decide to extract four factors.

Horn’s Parallel Analysis: The researcher may compare the eigenvalues of the factors in the actual data to the eigenvalues of factors in randomly generated data with the same sample size and number of variables. If the eigenvalues of the actual data exceed the eigenvalues of the randomly generated data for the first four factors, the researcher may decide to extract four factors.

In this example, the different methods for determining the number of factors may lead to different results. Comprehensibility and previous research suggest that there may be four factors, while the Kaiser criterion, variance explained criteria, and Cattell’s scree plot suggest that four factors may be appropriate. Horn’s parallel analysis may also support the extraction of four factors. The choice of which method to use ultimately depends on the research question and the nature of the data. In some cases, a combination of methods may be used to determine the appropriate number of factors. For example, the researcher may consider both the theoretical structure of customer satisfaction and the results of the factor analysis to decide on the appropriate number of factors to extract.

Overall, the different methods for determining the number of factors to be extracted in the USArrests dataset lead to the extraction of three factors. This suggests that there are three underlying dimensions of crime rates in the United States that are measured by the variables in the dataset. The researcher may interpret these factors as representing different aspects of crime rates, such as violent crime, property crime, and white-collar crime. However, it is important to note that the choice of which method to use ultimately depends on the research question and the nature of the data. Different methods may lead to different conclusions about the appropriate number of factors to extract.

Module IV: Communality and Eigen Values

Communality and eigenvalues are two important concepts in factor analysis. Here’s an explanation of what they are and how they are related:

Communalities : In factor analysis, communalities refer to the proportion of variance in each original variable that is accounted for by the extracted factors. Communalities range from 0 to 1, with higher values indicating that a larger proportion of the variance in the variable is explained by the factors. Communalities can be computed as the sum of the squared factor loadings for each variable.

Eigenvalues : Eigenvalues represent the amount of variance in the original variables that is explained by each factor. They are computed as the sum of the squared factor loadings for each factor. Eigenvalues are used to determine the number of factors to extract by examining the magnitude of each eigenvalue .

Communalities and eigenvalues are related in that they both represent the amount of variance in the original variables that is explained by the extracted factors. However, they differ in their interpretation and calculation.

Communalities are used to assess the overall adequacy of the factor solution. Higher communalities indicate that the extracted factors are accounting for a larger proportion of the variance in the original variables. If some variables have low communalities, it may indicate that they are not well represented by the factor solution and that additional factors may be needed to fully capture their variance.

Eigenvalues, on the other hand, are used to determine the number of factors to extract. Factors with eigenvalues greater than 1 are considered to be important and are typically retained. This is because factors with eigenvalues less than 1 explain less variance than a single original variable. Eigenvalues provide information about the relative importance of each factor in explaining the variance in the original variables.

In summary, communalities and eigenvalues are both important measures in factor analysis, but they serve different purposes. Communalities provide information about the overall adequacy of the factor solution, while eigenvalues are used to determine the number of factors to extract.

Meaning of communality

In factor analysis, communality represents the proportion of variance in each original variable that is accounted for by the extracted factors . In other words, it is the amount of shared variance between the original variable and the factors. Communality is computed as the sum of the squared factor loadings for each variable. The squared factor loadings represent the proportion of variance in the variable that is explained by each factor. By summing the squared factor loadings across all factors, we can obtain the proportion of total variance in the variable that is accounted for by the factors. This is the communality.

Communality ranges from 0 to 1, with higher values indicating that a larger proportion of the variance in the variable is explained by the factors. A communality of 1 indicates that all the variance in the variable is accounted for by the extracted factors, while a communality of 0 indicates that none of the variance in the variable is accounted for by the factors.

Communality is an important measure in factor analysis because it provides information about the overall adequacy of the factor solution. Higher communalities indicate that the extracted factors are accounting for a larger proportion of the variance in the original variables. If some variables have low communalities, it may indicate that they are not well represented by the factor solution and that additional factors may be needed to fully capture their variance.

In summary, communality is a measure of the amount of shared variance between the original variables and the extracted factors in factor analysis. It is an important measure for assessing the overall adequacy of the factor solution and identifying variables that may need additional factors to fully capture their variance.

Role of communality in Factor Analysis

Communality plays an important role in factor analysis in several ways:

Adequacy of the factor solution : Communality provides information about the overall adequacy of the factor solution. Higher communalities indicate that the extracted factors are accounting for a larger proportion of the variance in the original variables. If some variables have low communalities, it may indicate that they are not well represented by the factor solution and that additional factors may be needed to fully capture their variance.

Factor selection : Communality is used to determine which factors should be retained in the factor solution. Factors with low communalities are less important and may be dropped from the solution. Factors with high communalities are important and should be retained.

Interpretation of factors : Communality provides information about the unique variance in each variable that is not accounted for by the extracted factors. This unique variance can be used to interpret the meaning of each factor. Variables with high communality values are more strongly related to the extracted factors and may be useful for interpreting the meaning of each factor.

Identification of outliers : Communalities can be used to identify outliers in the data. Variables with extremely low communalities may be outliers and may need to be removed from the analysis.

Overall, communality is an important measure in factor analysis that provides information about the overall adequacy of the factor solution, helps in the selection of factors, aids in the interpretation of factors, and can be used to identify outliers in the data.

Computing communality

Computing communality involves calculating the proportion of variance in each original variable that is accounted for by the extracted factors in a factor analysis. Here’s how to compute communality:

Perform a factor analysis on the dataset using a chosen method and number of factors.

Obtain the factor loadings for each variable. These are the correlations between each variable and each factor.

Square the factor loadings for each variable to obtain the proportion of variance in the variable that is accounted for by each factor.

Sum the squared factor loadings across all factors to obtain the total proportion of variance in the variable that is accounted for by the extracted factors.

The sum of the squared factor loadings is the communality for the variable.

Communality ranges from 0 to 1, with higher values indicating that a larger proportion of the variance in the variable is accounted for by the extracted factors.

Here’s an example R code that computes communality for the built-in USArrests dataset:

In this example, we performed a factor analysis with 1 factor on the USArrests dataset using the factanal() function in R. We then computed the communality for each variable by squaring the factor loadings and summing them across all factors using the apply() function in R. Finally, we printed the resulting communality values for each variable.

By examining the communality values, we can see which variables are most strongly related to the extracted factors and how much of their variance is accounted for by the factors.

Interpreting communality

Interpreting communality involves understanding the amount of variance in each original variable that is accounted for by the extracted factors in a factor analysis. Here are some key points to consider when interpreting communality:

High communality values: Variables with high communality values indicate that a large proportion of their variance is accounted for by the extracted factors. These variables are more strongly related to the factors and may be useful for interpreting the meaning of each factor.

Low communality values: Variables with low communality values indicate that a small proportion of their variance is accounted for by the extracted factors. These variables may not be well represented by the factor solution and may need additional factors to fully capture their variance.

Total variance accounted for: The sum of the communality values across all variables indicates the total proportion of variance in the dataset that is explained by the extracted factors. This can be used to assess the overall adequacy of the factor solution.

Outliers: Variables with extremely low communality values may indicate outliers in the data. These variables may not fit well with the overall pattern of the data and may need to be removed from the analysis.

Overlapping variance: It is important to note that communality measures the shared variance between the original variables and the extracted factors. Variables may have unique variance that is not accounted for by the factors. Thus, low communality values do not necessarily mean that a variable is unimportant or should be removed from the analysis.

Overall, interpreting communality involves understanding the extent to which the extracted factors account for the variance in the original variables. High communality values indicate that the extracted factors are strongly related to the variables, while low communality values may indicate the need for additional factors or the presence of outliers in the data.

Eigen Value

In factor analysis, the eigenvalue of a factor represents the amount of variance in the original variables that is explained by that factor . Specifically, it is the sum of the squared factor loadings for the factor.

Eigenvalues provide information about the relative importance of each factor in explaining the variance in the original variables. Factors with higher eigenvalues explain a larger proportion of the variance in the data than factors with lower eigenvalues.

Eigenvalues are used to determine the number of factors to extract in a factor analysis. One common method for selecting the number of factors is to retain only those factors with eigenvalues greater than 1. This is because factors with eigenvalues less than 1 explain less variance than a single original variable. Another method for selecting the number of factors is to examine a scree plot, which shows the eigenvalues for each factor in descending order. The number of factors to extract is chosen at the “elbow” of the plot, where the eigenvalues start to level off.

It is important to note that eigenvalues are relative measures of importance and can be affected by the number of variables and the sample size.

Thus, it is recommended to use multiple methods for determining the number of factors to extract and to interpret the results in conjunction with other information, such as factor loadings and communalities.

Overall, eigenvalues are an important measure in factor analysis that provide information about the relative importance of each factor in explaining the variance in the original variables. They are used to determine the number of factors to extract and aid in the interpretation of the factor solution.

Role of eigen value

The eigenvalue plays an important role in factor analysis in several ways:

Determining the number of factors: Eigenvalues are used to determine the number of factors to extract in factor analysis. Factors with eigenvalues greater than 1 are considered to be important and are typically retained. This is because factors with eigenvalues less than 1 explain less variance than a single original variable. The number of factors to extract can also be determined by examining a scree plot of the eigenvalues.

Assessing factor importance: Eigenvalues provide information about the relative importance of each factor in explaining the variance in the original variables. Factors with higher eigenvalues explain a larger proportion of the variance in the data than factors with lower eigenvalues. This information can be used to assess the importance of each factor in the factor solution.

Interpreting factor meaning: Eigenvalues can aid in the interpretation of the meaning of each factor. Factors with high eigenvalues explain a larger proportion of the variance in the original variables and are more important for interpreting the meaning of the factor.

Identifying outliers: Eigenvalues can be used to identify outliers in the data. Outliers can be identified as variables that have low communalities and/or low eigenvalues.

Overall, eigenvalues are an important measure in factor analysis that provide information about the number of factors to extract, the importance of each factor, and can aid in the interpretation of the factor solution. It is important to note that eigenvalues are relative measures of importance and should be used in conjunction with other information, such as factor loadings and communalities, to fully interpret the results of the factor analysis.

Computing eigen value

Computing the eigenvalues in factor analysis involves extracting the factors and calculating the amount of variance in the original variables that is explained by each factor.

Here’s how to compute the eigenvalues:

Calculate the correlation matrix for the original variables.

Use a matrix decomposition technique, such as the eigenvalue decomposition, to obtain the eigenvalues and eigenvectors of the correlation matrix.

The eigenvalues represent the amount of variance in the correlation matrix that is accounted for by each eigenvector. The eigenvalues are equal to the sum of the squared loadings for each factor. The eigenvalues can be used to determine the number of factors to retain in the factor solution. Factors with eigenvalues greater than 1 are typically considered to be important and are retained.

Here’s an example R code that computes the eigenvalues for the built-in USArrests dataset:

In this example, we first computed the correlation matrix for the USArrests dataset using the cor() function in R. We then performed an eigenvalue decomposition of the correlation matrix using the eigen() function in R. Finally, we extracted the eigenvalues from the resulting eigenvalue decomposition and printed them to the console using the print() function. By examining the eigenvalues, we can determine the number of factors to retain in the factor solution and assess the importance of each factor in explaining the variance in the original variables.

Interpreting eigen value

Interpreting eigenvalues in factor analysis involves understanding the amount of variance in the original variables that is explained by each factor. Here are some key points to consider when interpreting eigenvalues:

Importance of each factor : Eigenvalues provide information about the importance of each factor in explaining the variance in the original variables. Factors with higher eigenvalues explain a larger proportion of the variance in the data than factors with lower eigenvalues. Factors with eigenvalues greater than 1 are typically considered to be important and are retained in the factor solution.

Number of factors to extract : Eigenvalues are used to determine the number of factors to extract in factor analysis. The number of factors can be determined by retaining only those factors with eigenvalues greater than 1 or by examining a scree plot of the eigenvalues. The number of factors to extract should be based on a combination of the eigenvalues, the factor loadings, and the overall interpretability of the factor solution.

Overlapping variance : It is important to note that eigenvalues measure the shared variance between the original variables and the extracted factors. Variables may have unique variance that is not accounted for by the factors. Thus, low eigenvalues do not necessarily mean that a factor is unimportant or should be removed from the analysis.

Sample size : The eigenvalues can be affected by the sample size. In general, larger sample sizes tend to result in larger eigenvalues. Thus, it is important to interpret eigenvalues in conjunction with the factor loadings and communalities.

In summary, interpreting eigenvalues involves understanding the importance of each factor in explaining the variance in the original variables and determining the number of factors to extract in the factor solution. Eigenvalues should be used in conjunction with other information, such as factor loadings and communalities, to fully interpret the results of the factor analysis.

Using eigen values and communality in factor analysis

Eigenvalues and communality are both important measures in factor analysis that provide information about the relative importance of each factor and the amount of variance in the original variables that is accounted for by the factors. Here’s how to use eigenvalues and communality in factor analysis:

Determine the number of factors to extract: Eigenvalues can be used to determine the number of factors to extract in factor analysis. Factors with eigenvalues greater than 1 are typically considered to be important and are retained in the factor solution. However, the number of factors to extract should also be based on other factors, such as the factor loadings and the overall interpretability of the factor solution.

Assess factor importance: Eigenvalues provide information about the relative importance of each factor in explaining the variance in the original variables. Factors with higher eigenvalues explain a larger proportion of the variance in the data than factors with lower eigenvalues. This information can be used to assess the importance of each factor in the factor solution.

Assess variable importance: Communality measures the amount of variance in the original variables that is accounted for by the extracted factors. Variables with high communality values have a larger proportion of their variance accounted for by the factors and are more important for interpreting the meaning of each factor. Variables with low communality values may not be well represented by the factor solution and may need additional factors to fully capture their variance.

Interpret the factor solution: Eigenvalues and communality can be used in conjunction with factor loadings to interpret the factor solution. High eigenvalues and communality values indicate that the extracted factors are important and explain a large proportion of the variance in the original variables. Additionally, variables with high communality values and strong factor loadings on a particular factor provide insight into the meaning of each factor.

Overall, eigenvalues and communality are important measures in factor analysis that provide information about the relative importance of each factor and the amount of variance in the original variables that is accounted for by the factors. They are used to determine the number of factors to extract, assess the importance of each factor and variable, and aid in the interpretation of the factor solution. ## Example 1

For this example, we will use the built-in USArrests dataset in R, which contains data on violent crime rates in US states. We will perform a principal component analysis (PCA) on the dataset and compute both eigenvalues and communality.

In this example, we first loaded the USArrests dataset in R. We then performed a PCA on the dataset using the princomp() function, specifying that we want to use the correlation matrix (cor = TRUE). We then extracted the eigenvalues by squaring the standard deviations of the principal components (pca \(sdev^2) and the communality by summing the squared loadings for each variable (apply(pca\) loadings^2, 2, sum)). Finally, we printed the eigenvalues and communality to the console using the print() function.

The output of the code will show us the eigenvalues and communality for each principal component. The eigenvalues represent the amount of variance explained by each principal component, while the communality represents the proportion of variance in each variable that is accounted for by all the principal components.

Interpretation of the results:

Eigenvalues : The output will show four eigenvalues, one for each principal component. The first eigenvalue will be the largest, followed by the second, third, and fourth. The eigenvalues represent the amount of variance explained by each principal component. For example, if the first eigenvalue is 300, this means that the first principal component explains 300 units of variance in the original data.

Communality : The output will show four communality values, one for each variable. The communality represents the proportion of variance in each variable that is accounted for by all the principal components. For example, if the communality value for Murder is 0.81, this means that the principal components together account for 81% of the variance in the Murder variable.

Overall, the eigenvalues and communality help us understand how much variance is explained by the principal components and how important each variable is in the analysis. We can use this information to decide how many principal components to retain and how to interpret the results of the PCA.

For this example, we will use the built-in mtcars dataset in R, which contains data on various characteristics of 32 cars. We will perform a PCA on the dataset and compute both eigenvalues and communality.

In this example, we first loaded the mtcars dataset in R. We then performed a PCA on the dataset using the princomp() function, specifying that we want to use the correlation matrix (cor = TRUE). We then extracted the eigenvalues by squaring the standard deviations of the principal components (pca \(sdev^2) and the communality by summing the squared loadings for each variable (apply(pca\) loadings^2, 2, sum)). Finally, we printed the eigenvalues and communality to the console using the print() function. The output of the code will show us the eigenvalues and communality for each principal component. The eigenvalues represent the amount of variance explained by each principal component, while the communality represents the proportion of variance in each variable that is accounted for by all the principal components. ### Interpretation of the results:

Eigenvalues : The output will show 11 eigenvalues, one for each principal component. The first eigenvalue will be the largest, followed by the second, third, and so on. The eigenvalues represent the amount of variance explained by each principal component. For example, if the first eigenvalue is 20, this means that the first principal component explains 20 units of variance in the original data.

Communality : The output will show 11 communality values, one for each variable. The communality represents the proportion of variance in each variable that is accounted for by all the principal components. For example, if the communality value for mpg is 0.72, this means that the principal components together account for 72% of the variance in the mpg variable.

For this example, we will use the built-in iris dataset in R, which contains data on the measurements of iris flowers. We will perform a PCA on the dataset and compute both eigenvalues and communality.

In this example, we first loaded the iris dataset in R. We then performed a PCA on the first four columns of the dataset (which contain the measurements of the flowers) using the princomp() function, specifying that we want to use the correlation matrix (cor = TRUE). We then extracted the eigenvalues by squaring the standard deviations of the principal components (pca \(sdev^2) and the communality by summing the squared loadings for each variable (apply(pca\) loadings^2, 2, sum)).

Finally, we printed the eigenvalues and communality to the console using the print() function. The output of the code will show us the eigenvalues and communality for each principal component. The eigenvalues represent the amount of variance explained by each principal component, while the communality represents the proportion of variance in each variable that is accounted for by all the principal components.

Eigenvalues : The output will show four eigenvalues, one for each principal component. The first eigenvalue will be the largest, followed by the second, third, and fourth. The eigenvalues represent the amount of variance explained by each principal component. For example, if the first eigenvalue is 2.93, this means that the first principal component explains 2.93 units of variance in the original data.

Communality : The output will show four communality values, one for each variable. The communality represents the proportion of variance in each variable that is accounted for by all the principal components. For example, if the communality value for Sepal.Length is 0.76, this means that the principal components together account for 76% of the variance in the Sepal.Length variable.

Module V: Factor Loading

Factor loading is a key concept in factor analysis that refers to the correlation between each variable and each factor. Specifically, factor loading represents the extent to which each variable is associated with a particular factor, and is typically expressed as a coefficient that ranges from \(-1\) to \(1\) .

In factor analysis, the goal is to identify a small number of underlying factors that can explain the variance in a set of observed variables. Factor loading is an important measure because it provides information about which variables are most strongly associated with each factor. Variables with high factor loadings on a particular factor are considered to be most strongly related to that factor, while variables with low factor loadings are considered to be less related.

To calculate factor loadings in factor analysis, we first extract the factors from the data using a method such as principal component analysis (PCA) or maximum likelihood estimation. We then calculate the correlation between each variable and each factor, which gives us a set of factor loadings. The factor loadings for each variable can be visualized as a vector that indicates the direction and strength of the association between the variable and each factor.

Interpreting factor loadings involves examining both the magnitude and sign of the coefficients. A positive factor loading indicates a positive association between the variable and the factor, while a negative factor loading indicates a negative association. The magnitude of the factor loading gives us information about the strength of the association, with larger magnitudes indicating stronger associations.

Overall, factor loading is an important concept in factor analysis that helps us understand the relationship between variables and underlying factors. It can be used to identify which variables are most strongly associated with each factor, and to interpret the meaning of the factors in the factor solution.

Meaning and Definition of factor loading

Factor loading is a statistical measure that represents the correlation between a variable and a factor in factor analysis. In other words, it indicates how much of the variance in a particular variable can be explained by a particular factor. Factor loading is a key concept in factor analysis because it helps us understand the relationship between variables and factors, and can be used to identify the most important variables in a factor solution.

More specifically, factor loading is a coefficient that ranges from -1 to 1, with positive values indicating a positive relationship between the variable and the factor, negative values indicating a negative relationship, and values closer to zero indicating a weaker relationship. The magnitude of the factor loading indicates the strength of the relationship, with larger values indicating stronger relationships.

In factor analysis, the goal is to identify underlying factors that can explain the variance in a set of observed variables. Factor loading plays an important role in this process because it helps us determine which variables are most strongly associated with each factor. Variables with high factor loadings on a particular factor are considered to be most strongly related to that factor, while variables with low factor loadings are considered to be less related.

Factor loading can be calculated using various methods, including principal component analysis (PCA), maximum likelihood estimation, and other factor extraction methods. Once the factor loadings are calculated, they can be used to interpret the meaning of the factors in the factor solution and to identify which variables are most important in explaining the underlying factors.

Overall, factor loading is a crucial concept in factor analysis that helps us understand the relationship between variables and factors, and can be used to identify the most important variables in a factor solution.

Role of factor loading

The role of factor loading in factor analysis is to measure the strength and direction of the relationship between a variable and a factor. Factor loading is a key concept in factor analysis because it helps us understand which variables are most strongly associated with each factor, and can be used to interpret the meaning of the factors in the factor solution.

The factor loading for a variable and factor is a correlation coefficient that ranges from -1 to 1, where positive values indicate a positive relationship between the variable and the factor, and negative values indicate a negative relationship. The magnitude of the factor loading indicates the strength of the relationship, with larger values indicating stronger relationships.

The importance of factor loading lies in its ability to help us identify the variables that are most important in explaining the underlying factors. Variables with high factor loadings on a particular factor are considered to be most strongly related to that factor, while variables with low factor loadings are considered to be less related. By examining the factor loadings, we can determine which variables contribute the most to each factor, and use this information to interpret the meaning of the factors.

Factor loading is also used to determine the number of factors that should be retained in the factor solution. In general, factors with high total variance explained and high average factor loadings are retained, while factors with low variance explained and low average factor loadings are discarded.

Overall, the role of factor loading in factor analysis is to measure the strength and direction of the relationship between variables and factors, and to help us identify the variables that are most important in explaining the underlying factors. By examining the factor loadings, we can interpret the meaning of the factors, determine the number of factors to retain, and gain insights into the underlying structure of the data.

Computing factor loading

To compute factor loadings in factor analysis, we need to first extract the factors from the data using a method such as principal component analysis (PCA) or maximum likelihood estimation. Once the factors have been extracted, we can calculate the correlation between each variable and each factor, which gives us a set of factor loadings. Here’s an example of how to compute factor loadings using PCA in R:

In this example, we first loaded the iris dataset in R. We then performed a PCA on the first four columns of the dataset (which contain the measurements of the flowers) using the princomp() function, specifying that we want to use the correlation matrix (cor = TRUE). We then extracted the factor loadings from the PCA using the $loadings attribute of the pca object. Finally, we printed the factor loadings to the console using the print() function.

The output of the code will show us the factor loadings for each variable and each factor. The factor loadings for each variable can be visualized as a vector that indicates the direction and strength of the association between the variable and each factor.

Interpreting the factor loadings involves examining both the magnitude and sign of the coefficients. A positive factor loading indicates a positive association between the variable and the factor, while a negative factor loading indicates a negative association. The magnitude of the factor loading gives us information about the strength of the association, with larger magnitudes indicating stronger associations.

Overall, computing factor loadings using PCA in R (or any other statistical software) is a key step in factor analysis that helps us understand the relationship between variables and underlying factors, and can be used to interpret the meaning of the factors in the factor solution. ## Interpreting factor loading

Interpreting factor loading is an important step in factor analysis, as it helps us understand the relationship between variables and underlying factors. Factor loading is a coefficient that ranges from -1 to 1, with positive values indicating a positive relationship between the variable and the factor, negative values indicating a negative relationship, and values closer to zero indicating a weaker relationship.

To interpret factor loading, we need to examine both the magnitude and sign of the coefficient. A positive factor loading indicates that the variable increases as the factor increases, while a negative factor loading indicates that the variable decreases as the factor increases. The magnitude of the factor loading indicates the strength of the relationship between the variable and the factor, with larger values indicating stronger relationships. Generally, factor loadings with a magnitude of 0.3 or greater are considered to be meaningful and potentially useful for interpretation. However, the cutoff for “meaningful” factor loadings may vary depending on the specific research question or context. Additionally, it’s important to consider the overall pattern of factor loadings across variables and factors to gain a holistic understanding of the underlying structure of the data.

Factor loadings can be visualized using a scatter plot or a biplot. A scatter plot shows the relationship between two variables in a two-dimensional space, with each variable represented as a point. Factor loadings can be added to the scatter plot as vectors that indicate the direction and strength of the association between the variables and the factors. A biplot is a type of scatter plot that shows both the variables and the factors on the same plot, allowing us to see the relationships between variables and factors in a single visual.

Overall, interpreting factor loading in factor analysis is a crucial step in understanding the underlying structure of the data. By examining the magnitude and sign of the coefficients, we can identify which variables are most strongly associated with each factor, and use this information to interpret the meaning of the factors and gain insights into the underlying structure of the data.

R code for computing factor loadings and interpreting the results using the mtcars dataset in R.

In this code, we first load the mtcars dataset in R. We then perform a principal component analysis (PCA) on the dataset using the princomp() function, specifying that we want to use the correlation matrix (cor = TRUE). We then extract the factor loadings from the PCA using the $loadings attribute of the pca object. Finally, we print the factor loadings to the console using the print() function. The output of the code will show us the factor loadings for each variable and each factor. The factor loadings for each variable can be visualized as a vector that indicates the direction and strength of the association between the variable and each factor.

To interpret the factor loadings, we need to examine both the magnitude and sign of the coefficients. A positive factor loading indicates a positive association between the variable and the factor, while a negative factor loading indicates a negative association. The magnitude of the factor loading gives us information about the strength of the association, with larger magnitudes indicating stronger associations.

For example, the output of the code may show us that the mpg variable has a strong negative factor loading on the first factor, while the disp variable has a strong positive factor loading on the first factor. This suggests that the first factor is primarily associated with fuel efficiency and engine size. We can use this information to interpret the meaning of the first factor and gain insights into the underlying structure of the data.

Overall, computing factor loadings using PCA in R and interpreting the results is a crucial step in factor analysis that helps us understand the relationship between variables and underlying factors, and can be used to interpret the meaning of the factors in the factor solution.

R code for computing factor loadings and interpreting the results using the USArrests dataset in R.

In this code, we first load the USArrests dataset in R. This dataset contains information on the number of arrests per 100,000 residents for each of four crimes (murder, assault, rape, and robbery) in each of the 50 US states in 1973. We then perform a principal component analysis (PCA) on the dataset using the princomp() function, specifying that we want to use the correlation matrix (cor = TRUE). We then extract the factor loadings from the PCA using the $loadings attribute of the pca object.

Finally, we print the factor loadings to the console using the print() function. The output of the code will show us the factor loadings for each variable and each factor. The factor loadings for each variable can be visualized as a vector that indicates the direction and strength of the association between the variable and each factor.

To interpret the factor loadings, we need to examine both the magnitude and sign of the coefficients. For example, the output of the code may show us that the murder variable has a strong positive factor loading on the first factor, while the assault variable has a strong positive factor loading on both the first and second factor. This suggests that the first factor is primarily associated with violent crime in general, while the second factor is primarily associated with assault.

We can use this information to interpret the meaning of the factors and gain insights into the underlying structure of the data. Overall, computing factor loadings using PCA in R and interpreting the results is a crucial step in factor analysis that helps us understand the relationship between variables and underlying factors, and can be used to interpret the meaning of the factors in the factor solution.

Factor rotation

Factor rotation is a technique used in factor analysis to improve the interpretability of the factor solution. The goal of factor rotation is to find a new set of factors that are easier to interpret than the original unrotated factors.

In unrotated factor analysis, the factor loadings are simply correlated with each other and the factors are not constrained to be orthogonal. This means that each variable can potentially load on multiple factors, making it difficult to interpret the meaning of each factor.

In contrast, rotated factor analysis imposes a constraint on the factors to be orthogonal to each other, which simplifies the interpretation of the factor solution. There are several methods of factor rotation, including orthogonal rotation methods such as Varimax and Quartimax, and oblique rotation methods such as Promax and Oblimin.

Orthogonal rotation methods, such as Varimax and Quartimax, rotate the factors in a way that maximizes the variance of the squared factor loadings within each factor. This results in a factor solution where each variable loads heavily on only one factor, making it easier to interpret the meaning of each factor. Oblique rotation methods, such as Promax and Oblimin, allow the factors to be correlated with each other, which may be more realistic in some cases where the factors are expected to be correlated. These methods rotate the factors to minimize the complexity of the factor solution, while still allowing the factors to be correlated.

To perform factor rotation in R, we can use the rotate() function from the psych package. Here is an example code for performing Varimax rotation on a factor solution:

In this example, we first load the iris dataset in R and perform a principal component analysis using the princomp() function. We then apply Varimax rotation to the factor solution using the rotate() function from the psych package, specifying “varimax” as the rotation method. Finally, we print the rotated factor loadings to the console using the print() function.

Overall, factor rotation is a useful technique in factor analysis that can help improve the interpretability of the factor solution by simplifying the relationships between variables and factors.

Why Factor rotation?

Factor rotation is used in factor analysis to improve the interpretability of the factor solution. The goal of factor analysis is to identify underlying factors that explain the patterns of correlations among a set of observed variables. However, the initial factor solution may not be easy to interpret because each variable may load on multiple factors, and the factors may not be clearly defined or easily distinguishable from each other.

Factor rotation helps to simplify the factor solution by rotating the original factors into a new set of orthogonal or oblique factors that are easier to interpret. By doing so, factor rotation can improve the clarity and meaningfulness of the factor solution, and help researchers identify the underlying constructs that are driving the patterns of correlations among the observed variables. Orthogonal rotation methods, such as Varimax and Quartimax, rotate the factors to be orthogonal to each other, meaning that the factors are uncorrelated. This simplifies the interpretation of the factor solution by ensuring that each variable loads heavily on only one factor, and that each factor represents a distinct underlying construct. Orthogonal rotation is particularly useful when the factors are expected to be unrelated to each other.

Oblique rotation methods, such as Promax and Oblimin, allow the factors to be correlated with each other, which may be more realistic in some cases where the factors are expected to be correlated. These methods rotate the factors to minimize the complexity of the factor solution, while still allowing the factors to be correlated. Oblique rotation is particularly useful when the factors are expected to be related to each other, such as in the case of personality traits or cognitive abilities.

In summary, factor rotation is an important step in factor analysis that helps to simplify and clarify the factor solution, making it easier to interpret and understand the underlying constructs that are driving the patterns of correlations among the observed variables.

Methods of Factor rotation

There are two main methods of factor rotation in factor analysis: orthogonal rotation and oblique rotation.

Orthogonal Rotation : Orthogonal rotation methods, such as Varimax and Quartimax, rotate the original factors to be orthogonal to each other. This means that the factors are uncorrelated, and each variable loads heavily on only one factor. The main goal of orthogonal rotation is to simplify the factor solution and make it easier to interpret. The most commonly used orthogonal rotation method is Varimax, which maximizes the variance of the squared factor loadings within each factor. Quartimax is another orthogonal rotation method that focuses on minimizing the number of factors that are needed to explain the total variance in the data. Oblique Rotation : Oblique rotation methods, such as Promax and Oblimin, allow the original factors to be correlated with each other. This means that the factors are not orthogonal, and each variable may load on multiple factors. The main goal of oblique rotation is to find a simpler factor structure that still accounts for the correlation among the variables.

Promax is the most commonly used oblique rotation method, which is designed to maximize the interpretability of the factor solution by simplifying the factor structure while still allowing for some correlation among the factors. Oblimin is another oblique rotation method that focuses on minimizing the complexity of the factor solution by encouraging the factors to be uncorrelated, but not necessarily orthogonal.

Both orthogonal and oblique rotation methods have their advantages and disadvantages, and the choice of rotation method depends on the specific research question and the nature of the data. Orthogonal rotation is more straightforward and easier to interpret, but may not be appropriate in cases where the factors are expected to be correlated. Oblique rotation is more flexible and can better account for the correlation among the variables, but may produce a more complex factor solution that is harder to interpret. It’s important to carefully consider the benefits and drawbacks of each rotation method before selecting the appropriate one for a given analysis.

Computing and Interpreting VARIMAX AND QUARTIMAX Rotations in R

R code for computing and interpreting Varimax and Quartimax rotations using the mtcars dataset in R:

In this code, we first load the mtcars dataset in R. We then perform a principal component analysis (PCA) on the dataset using the princomp() function, specifying that we want to use the correlation matrix (cor = TRUE). We then extract the factor loadings from the PCA using the $loadings attribute of the pca object.

Next, we perform Varimax and Quartimax rotations on the factor solution using the rotate() function from the psych package. We specify “varimax” and “quartimax” as the rotation methods, respectively.

Finally, we print the rotated factor loadings to the console using the print() function. To interpret the results, we look at the loadings of each variable on each factor in the rotated factor solution. A high loading indicates a strong relationship between the variable and the factor, while a low loading indicates a weak relationship.

For example, the output of the code for Varimax rotation may show us that the mpg variable has a high loading on the first factor, while the disp variable has a high loading on the second factor. This suggests that the first factor is primarily associated with fuel efficiency, while the second factor is primarily associated with engine size.

Similarly, the output of the code for Quartimax rotation may show us that the mpg variable has a high loading on the first factor, while the cyl variable has a high loading on the second factor. This suggests that the first factor is primarily associated with fuel efficiency, while the second factor is primarily associated with the number of cylinders in the engine.

Overall, Varimax and Quartimax rotations can yield different factor solutions, depending on the nature of the data and the research question. It’s important to carefully interpret the rotated factor loadings and choose the rotation method that best fits the research question and provides the most interpretable factor solution.

In this example, the EFA results suggest that there are two factors that explain the variance in the five variables. The first factor (MR2) has high loadings on “A5” and moderate loadings on “A4” and “A1”, while the second factor (MR1) has high loadings on “A2” and “A3”. The communalities range from 0.67 to 0.88, indicating that a high proportion of the variance in each variable is explained by the two factors.

Confirmatory Factor Analysis (CFA)

To conduct CFA, we first need to specify a theoretical model that specifies how the variables are related to the factors. In this example, we will specify a model in which all five variables load onto two factors, with the first factor (“F1”) defined by “A1”, “A2”, “A3”, and “A4”, and the second factor (“F2”) defined by “A5”. We will use the “cfa” function from the “lavaan” package to estimate the model:

This will print out the factor loadings for each variable, the standardized coefficients, and fit indices.

In this example, the CFA results suggest that the specified model fits the data well, based on the fit indices. The factor loadings are similar to those obtained from the EFA, with “A5” loading primarily on the second factor (F2) and the other four variables loading primarily on the first factor (F1).

Varimax, Oblimin and Promax Factor Rotation Techniques

Varimax, oblimin, and promax are all methods of factor rotation in exploratory factor analysis, and they differ in their approach to the rotation of the factor matrix. Here is a brief explanation of each method and how to implement it in R using the psych package:

Varimax Rotation

Varimax is an orthogonal rotation method, which means that it produces uncorrelated factors that are easier to interpret . It rotates the factor matrix to maximize the variance of the squared loadings for each factor.

Oblimin Rotation

Oblimin is a non-orthogonal rotation method, which means that it allows for correlated factors . It rotates the factor matrix to minimize the number of factors that have high loadings on a given variable.

Promax Rotation

Promax is also a non-orthogonal rotation method, but it is more flexible than oblimin and allows for the degree of correlation between factors to vary across the matrix . It rotates the factor matrix to minimize the complexity of the factor structure.

When choosing a rotation method, it is important to consider the underlying structure of the data and the research question. Orthogonal rotation methods like varimax are often used when the factors are expected to be uncorrelated, while non-orthogonal methods like oblimin and promax are better suited for situations where correlations between factors are expected .

Module VI: Factor Scores

Factor scores are values that represent the degree to which each observation (e.g., individual, subject, or item) in a dataset is associated with each underlying factor identified in a factor analysis. The factor scores are estimated based on the factor loadings and the observed values of the variables in the dataset.

In factor analysis, the goal is to identify the underlying factors that explain the patterns of correlations among a set of observed variables. Once the factors have been identified, the factor scores can be computed as weighted sums of the observed variables, where the weights are the factor loadings. The factor scores provide a way to summarize the information in the original dataset in terms of the underlying factors, and can be used in subsequent analyses to examine the relationships between the factors and other variables of interest.

There are several methods for computing factor scores, including regression-based methods, Bartlett’s method, and Anderson-Rubin method.

The most commonly used method is regression-based method, which involves regressing the observed variables on the factor loadings to estimate the factor scores. This method assumes that the factor scores are normally distributed and that the errors of the regression model are uncorrelated and have equal variances.

To compute factor scores in R, we can use the factanal() function, which performs factor analysis and computes factor scores. Here is an example code for computing factor scores using the mtcars dataset in R:

In this code, we first load the mtcars dataset in R. We then perform a factor analysis on the dataset using the factanal() function, specifying that we want to extract 3 factors (factors = 3). The factanal() function also computes the factor loadings and the communalities by default.

Next, we compute the factor scores using the predict() function, which takes the fa object and the original mtcars dataset as input, and outputs the estimated factor scores for each observation in the dataset.

Finally, we print the factor scores to the console using the print() function.

To interpret the results, we can examine the values of the factor scores for each observation and each factor. Higher values indicate a stronger association with the corresponding factor, while lower values indicate a weaker association. We can use these factor scores in subsequent analyses to examine the relationships between the factors and other variables of interest, or to group observations based on their similarity in terms of the underlying factors.

Meaning and Definition of Factor Scores

Factor scores are values that represent the degree to which each observation in a dataset is associated with each underlying factor identified in a factor analysis. They are computed as weighted sums of the observed variables, where the weights are the factor loadings.

The factor scores provide a way to summarize the information in the original dataset in terms of the underlying factors. They can be used in subsequent analyses to examine the relationships between the factors and other variables of interest, or to group observations based on their similarity in terms of the underlying factors.

In other words, factor scores are a way to quantify the degree to which each observation in a dataset exhibits the characteristics or traits that are captured by the underlying factors. For example, in a factor analysis of personality traits, the factor scores might represent the degree to which each individual in a sample exhibits the traits of extraversion, agreeableness, conscientiousness, neuroticism, and openness. The interpretation of factor scores depends on the specific research question and the nature of the factors being analyzed. In general, higher factor scores indicate a stronger association with the corresponding factor, while lower factor scores indicate a weaker association. Factor scores can be used to identify individuals or items that are high or low on a particular factor, or to group individuals or items based on their similarity in terms of the underlying factors.

It’s important to note that factor scores are estimates, and may not be perfectly accurate. The accuracy of the factor scores depends on the quality and reliability of the factor analysis and the observed variables.

Additionally, the interpretation of factor scores is subject to the same limitations and assumptions as factor analysis itself, such as the assumption of linearity and the assumption of normality. Therefore, it’s important to carefully interpret and use factor scores in conjunction with other measures and analyses to fully understand the underlying factors and their relationships with other variables of interest. ## Computing factor scores To compute factor scores in R, you can use the predict() function after performing a factor analysis using the factanal() function or the principal() function. Here’s an example code using the iris dataset in R:

In this code, we first load the iris dataset in R. We then perform a factor analysis on the first four columns of the dataset (which correspond to the measurements of sepal length, sepal width, petal length, and petal width) using the factanal() function, specifying that we want to extract 2 factors (factors = 2).

Next, we compute the factor scores using the predict() function, which takes the fa object and the original data (excluding the species column) as input, and outputs the estimated factor scores for each observation in the dataset.

Finally, we print the factor scores to the console using the print() function. The output of the code will be a matrix with the same number of rows as the original dataset, and with a number of columns equal to the number of extracted factors. Each row represents an observation in the dataset, and each column represents the estimated factor score for that observation on the corresponding factor.

Interpretation of Factor Scores

Interpreting factor scores involves examining the values of the factor scores for each observation and each factor, and determining the degree to which each observation exhibits the characteristics or traits that are captured by the underlying factors. In general, higher factor scores indicate a stronger association with the corresponding factor, while lower factor scores indicate a weaker association. The magnitude of the factor score indicates the degree to which the observation exhibits the characteristics of the factor, while the sign of the factor score indicates the direction of the association (positive or negative).

To interpret the factor scores, it’s important to consider the nature of the factors being analyzed and the research question. For example, if the factors represent personality traits, then higher scores on a factor such as extraversion might indicate a more outgoing and sociable personality, while lower scores might indicate a more introverted and reserved personality. Similarly, if the factors represent cognitive abilities, then higher scores on a factor such as verbal ability might indicate a greater proficiency in language and communication, while lower scores might indicate a lesser proficiency. It’s important to note that factor scores are estimates, and may not be perfectly accurate. The accuracy of the factor scores depends on the quality and reliability of the factor analysis and the observed variables. Additionally, the interpretation of factor scores is subject to the same limitations and assumptions as factor analysis itself, such as the assumption of linearity and the assumption of normality. Therefore, it’s important to carefully interpret and use factor scores in conjunction with other measures and analyses to fully understand the underlying factors and their relationships with other variables of interest. ## Application of Factor Scores

Factor scores can be used in a variety of ways to explore and understand the relationships between the underlying factors and other variables of interest. Here are some common uses of factor scores:

Group comparisons : Factor scores can be used to compare groups of individuals or items based on their similarity in terms of the underlying factors. For example, if the factors represent personality traits, factor scores can be used to compare groups of individuals with different personality profiles, such as introverts vs. extroverts.

Correlation analysis : Factor scores can be used in correlation analysis to examine the relationships between the underlying factors and other variables of interest. For example, if the factors represent cognitive abilities, factor scores can be used to examine the relationship between verbal ability and academic achievement.

Regression analysis : Factor scores can be used as predictor variables in regression analysis to predict outcomes of interest. For example, if the factors represent job-related skills, factor scores can be used to predict job performance.

Data reduction : Factor scores can be used as a way to summarize the information in a dataset in terms of the underlying factors. This can be useful for data visualization and exploratory data analysis.

It’s important to note that the interpretation and use of factor scores depend on the specific research question and the nature of the factors being analyzed. It’s also important to carefully consider the limitations and assumptions of factor analysis and factor scores, and to use them in conjunction with other measures and analyses to fully understand the underlying factors and their relationships with other variables of interest. ## Example 1 Computing and interpreting factor scores using the mtcars dataset in R:

In this code, we first load the mtcars dataset in R. We then perform a factor analysis on the first seven columns of the dataset (which correspond to the measurements of mpg, cyl, disp, hp, drat, wt, and qsec) using the fa() function, specifying that we want to extract 2 factors (factors = 2). Next, we compute the factor scores using the fa$scores() function, which takes the fa object and the original data (excluding the last column, which corresponds to the variable vs) as input, and outputs the estimated factor scores for each observation in the dataset. Finally, we print the factor scores to the console using the print() function. To interpret the results, we can examine the values of the factor scores for each observation and each factor. Higher values indicate a stronger association with the corresponding factor, while lower values indicate a weaker association. We can use these factor scores in subsequent analyses to examine the relationships between the factors and other variables of interest, or to group observations based on their similarity in terms of the underlying factors.

For example, let’s say that the first factor represents vehicle size and power, and the second factor represents fuel efficiency. We can interpret the factor scores as follows:

Higher scores on the first factor indicate larger and more powerful vehicles, while lower scores indicate smaller and less powerful vehicles.

Higher scores on the second factor indicate more fuel-efficient vehicles, while lower scores indicate less fuel-efficient vehicles.

We can use these interpretations to further explore the relationships between the factors and other variables of interest. For example, we could examine the relationship between the factor scores and the cost of the vehicles, or the satisfaction ratings of the drivers. It’s important to note that the interpretation and use of factor scores depend on the specific research question and the nature of the factors being analyzed. It’s also important to carefully consider the limitations and assumptions of factor analysis and factor scores, and to use them in conjunction with other measures and analyses to fully understand the underlying factors and their relationships with other variables of interest. ## Example 2 Computing and interpreting factor scores using the USArrests dataset in R:

In this code, we first load the USArrests dataset in R. We then perform a principal component analysis on the dataset using the princomp() function, specifying that we want to use the correlation matrix (cor = TRUE). Next, we compute the factor scores using the predict() function, which takes the pca object and the original data as input, and outputs the estimated factor scores for each observation in the dataset.

Finally, we print the factor scores to the console using the print() function. To interpret the results, we can examine the values of the factor scores for each observation and each factor. Higher values indicate a stronger association with the corresponding factor, while lower values indicate a weaker association. We can use these factor scores in subsequent analyses to examine the relationships between the factors and other variables of interest, or to group observations based on their similarity in terms of the underlying factors.

For example, let’s say that the first factor represents overall crime rate, while the second factor represents violent crime rate. We can interpret the factor scores as follows: Higher scores on the first factor indicate a higher overall crime rate in the state, while lower scores indicate a lower overall crime rate. Higher scores on the second factor indicate a higher violent crime rate in the state, while lower scores indicate a lower violent crime rate.

We can use these interpretations to further explore the relationships between the factors and other variables of interest. For example, we could examine the relationship between the factor scores and the state’s population density, or the state’s poverty rate.

It’s important to note that the interpretation and use of factor scores depend on the specific research question and the nature of the factors being analyzed. It’s also important to carefully consider the limitations and assumptions of factor analysis and factor scores, and to use them in conjunction with other measures and analyses to fully understand the underlying factors and their relationships with other variables of interest.

Practical Class 1

Exploratory factor analysis in r, practical class 2, principal component analysis in r.

In conclusion, this primer has provided an overview of factor analysis, a statistical technique commonly used in research to identify underlying dimensions or constructs that explain the variability among a set of observed variables. We have covered the meaning and assumptions of factor analysis, the differences between exploratory factor analysis (EFA) and confirmatory factor analysis (CFA), and the procedure for conducting factor analysis. We have also discussed the role of the correlation matrix and a general model of the correlation matrix of individual variables, as well as methods for extracting factors, such as principal component analysis (PCA), and determining the number of factors to be extracted.

Furthermore, we have covered the meaning and interpretation of communality and eigenvalues, factor loading and rotation methods, such as varimax, and the meaning and interpretation of factor scores and their use in subsequent analyses. Throughout the paper, we have used the R software to provide reproducible examples and code for conducting factor analysis.

By understanding the fundamental concepts of factor analysis and how to apply it in their research, readers will be able to identify underlying constructs or dimensions that may not be directly observable, and use these constructs to better understand the relationships between variables. Factor analysis can be a useful tool for researchers in a variety of fields, and this tutorial paper has provided a comprehensive guide to help readers get started with conducting factor analysis in their own research.

Thanks for your attention

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2023 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

Understanding Factor Analysis in Psychology

John Loeppky is a freelance journalist based in Regina, Saskatchewan, Canada, who has written about disability and health for outlets of all kinds.

factor analysis in research methodology

Steven Gans, MD is board-certified in psychiatry and is an active supervisor, teacher, and mentor at Massachusetts General Hospital.

factor analysis in research methodology

Skynesher / Getty Images

What Is Factor Analysis and What Does It Do?

Types of factor analysis, advantages and disadvantages of factor analysis, how is factor analysis used in psychology.

Like many methods encountered by those studying psychology , factor analysis has a long history.

The primary goal of factor analysis is to distill a large data set into a working set of connections or factors.

It was originally discussed by British psychologist Charles Spearman in the early 20th century and has gone on to be used in not only psychology but in other fields that often rely on statistical analyses,

But what is it, what are some real-world examples, and what are the different types? In this article, we'll answer all of those questions.

The primary goal of factor analysis is to distill a large data set into a working set of connections or factors. Dr. Jessie Borelli, PhD , who works at the University of California-Irvine, uses factor analysis in her work on attachment.

She is doing research that looks into how people perceive relationships and how they connect to one another. She gives the example of providing a hypothetical questionnaire with 100 items on it and using factor analysis to drill deeper into the data. "So, rather than looking at each individual item on its own I'd rather say, 'Is there is there any way in which these items kind of cluster together or go together so that I can... create units of analysis that are bigger than the individual items."

Factor analysis is looking to identify patterns where it is assumed that there are already connections between areas of the data.

An Example Where Factor Analysis Is Useful

One common example of a factor analysis is when you are taking something not easily quantifiable, like socio-economic status , and using it to group together highly correlated variables like income level and types of jobs.

Factor analysis isn't just used in psychology but also deployed in fields like sociology, business, and technology sector fields like machine learning.

There are two types of factor analysis that are most commonly referred to: exploratory factor analysis and confirmatory factor analysis.

Here are the two types of factor analysis:

  • Exploratory analysis : The goal of this analysis is to find general patterns in a set of data points.
  • Confirmatory factor analysis : The goal of this analysis is to test various hypothesized relationships among certain variables.

Exploratory Analysis

In an exploratory analysis, you are being a little bit more open-minded as a researcher because you are using this type of analysis to provide some clarity in your data set that you haven't yet found. It's an approach that Borelli uses in her own research.

Confirmatory Factor Analysis

On the other hand, if you're using a confirmatory factor analysis you are using the assumptions or theoretical findings you have already identified to drive your statistical model.

Unlike in an exploratory factor analysis, where the relationships between factors and variables are more open, a confirmatory factor analysis requires you to select which variables you are testing for. In Borelli's words:

"When you do a confirmatory factor analysis, you kind of tell your analytic program what you think the data should look like, in terms of, 'I think it should have these two factors and this is the way I think it should look.'"

Let's take a look at the advantages and disadvantages of factor analysis.

A main advantage of a factor analysis is that it allows researchers to reduce a number of variables by combining them into a single factor.

You Can Analyze Fewer Data Points

When answering your research questions, it's a lot easier to be working with three variables than thirty, for example.

Disadvantages

Disadvantages include that the factor analysis relies on the quality of the data, and also may allow for different interpretations of the data. For example, during one study, Borelli found that after deploying a factor analysis, she was still left with results that didn't connect well with what had been found in hundreds of other studies .

Due to the nature of the sample being new and being more culturally diverse than others being explored, she used an exploratory factor analysis that left her with more questions than answers.

The goal of factor analysis in psychology is often to make connections that allow researchers to develop models with common factors in ways that might be hard or impossible to observe otherwise.

So, for example, intelligence is a difficult concept to directly observe. However, it can be inferred from factors that we can directly measure on specific tests.

Factor analysis has often been used in the field of psychology to help us better understand the structure of personality.

This is due to the multitude of factors researchers have to consider when it comes to understanding the concept of personality. This area of personality research is certainly not new, with easily findable research dating as far back as 1942 recognizing its power in personality research.

Britannica. Charles E. Spearman .

United State Environmental Protection Agency. Exploratory Data Analysis .

Flora DB, Curran PJ. An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data .  Psychol Methods . 2004;9(4):466-491. doi:10.1037/1082-989X.9.4.466

Wolfle D. Factor analysis in the study of personality .  The Journal of Abnormal and Social Psychology. 1942;37(3):393–397.

By John Loeppky John Loeppky is a freelance journalist based in Regina, Saskatchewan, Canada, who has written about disability and health for outlets of all kinds.

Research on factor analysis and method for evaluating grouting effects using machine learning

Affiliations.

  • 1 Shandong Univ Sci & Technol, Coll Energy & Min Engn, Qingdao, People's Republic of China.
  • 2 Shandong Univ Sci & Technol, Coll Energy & Min Engn, Qingdao, People's Republic of China. [email protected].
  • 3 Shandong Energy Xinwen Mining Group Suncun Coal Mine, Taian, 271219, China.
  • PMID: 38565612
  • PMCID: PMC10987539
  • DOI: 10.1038/s41598-024-57837-x

The evaluation of grouting effects constitutes a critical aspect of grouting engineering. With the maturity of the grouting project, the workload and empirical characteristics of grouting effect evaluation are gradually revealed. In the context of the Qiuji coal mine's directional drilling and grouting to limestone aquifer reformation, this study thoroughly analyzes the influencing factors of grouting effects from geological and engineering perspectives, comparing these with various engineering indices associated with drilling and grouting. This led to the establishment of a "dual-process, multi-parameter, and multi-factor" system, employing correlation analysis to validate the selected indices' reasonableness and scientific merit. Utilizing the chosen indices, eight high-performing machine learning models and three parameter optimization algorithms were employed to develop a model for assessing the effectiveness of directional grouting in limestone aquifers. The model's efficacy was evaluated based on accuracy, recall, precision, and F-score metrics, followed by practical engineering validation. Results indicate that the "dual-process, multi-parameter, multi-factor" system elucidates the relationship between influencing factors and engineering parameters, demonstrating the intricacy of evaluating grouting effects. Analysis revealed that the correlation among the eight selected indicators-including the proportion of boreholes in the target rock strata, drilling length, leakage, water level, pressure of grouting, mass of slurry injected, permeability properties of limestone aquifers before being grouted, permeability properties of limestone aquifers after being grouted-is not substantial, underscoring their viability as independent indicators for grouting effect evaluation. Comparative analysis showed that the Adaboost machine learning model, optimized via a genetic algorithm, demonstrated superior performance and more accurate evaluation results. Engineering validation confirmed that this model provides a more precise and realistic assessment of grouting effects compared to traditional methods.

Keywords: Correlation analysis; Grouting effect evaluation; Machine learning; Optimization algorithm.

© 2024. The Author(s).

Grants and funding

  • ZR2022ME140/Natural Science Foundation of Shandong Province
  • 2021-CMCU-KF015/Open Fund for State Key Laboratory of Efficient Mining and Clean Utilization of Coal Resources
  • 51974172/National Natural Science Foundation of China
  • Open access
  • Published: 26 April 2024

Addressing medical student burnout through informal peer-assisted learning: a correlational analysis

  • Paola Campillo 1   na1 ,
  • Frances Ramírez de Arellano 1   na1 ,
  • Isabel C. Gómez 2   na1 ,
  • Natalia Jiménez 3 ,
  • Joan Boada-Grau 4 &
  • Legier V. Rojas 5  

BMC Medical Education volume  24 , Article number:  460 ( 2024 ) Cite this article

35 Accesses

Metrics details

Despite the recognized advantages of Peer-Assisted Learning (PAL) in academic settings, there is a notable absence of research analyzing its effects on students' Academic Burnout. This study aims to cover this gap by assessing the underlying effectiveness of Informal Peer-Assisted Learning (IPAL) as a cooperative learning method, focusing on its potential to mitigate academic burnout among medical students.

In 2022, a cross-sectional study was conducted at the School of Medicine, Universidad Central del Caribe, in Puerto Rico. The research team gathered data from 151 participants, 49.19% of 307 total student body. This cohort included 76 female students, 71 male students, and 4 individuals saying other. The School Burnout Inventory questionnaire (SBI-9) was employed to assess Academic Burnout, along with an added query about self-reported IPAL. The SBI-9 underwent validation processes to ascertain its reliability and validity, incorporating the Exploratory Factor Analysis and Confirmatory Factor Analysis. Following this, the investigators conducted an analysis to determine the correlation between academic burnout levels and involvement in IPAL.

The validation process of the questionnaire affirmed its alignment with an eight-item inventory, encapsulating two principal factors that elucidate academic burnout. The first factor pertains to exhaustion, while the second encompasses the combined subscales of cynicism and inadequacy.

The questionnaire shows high reliability (Cronbach's alpha = 0.829) and good fit indices (Comparative Fit Index = 0.934; Tucker-Lewis Index = 0.902; Standardized Root Mean Squared Residual = 0.0495; Root Mean Squared Error of Approximation = 0.09791; p -value < 0.001). The factors proven in the selected model were used to evaluate the correlation between Academic Burnout and IPAL. Students engaged in IPAL showed significantly lower academic burnout prevalence compared to those who never participated in such practices, with a mean academic burnout score of 44.75% (SD 18.50) for IPAL engaged students versus 54.89% (SD 23.71) for those who never engaged in such practices ( p -value < 0.013). Furthermore, within the group engaged in IPAL, students displayed lower levels of cynicism/inadequacy 41.98% (SD 23.41) compared to exhaustion 52.25% (SD 22.42) with a p -value < 0.001.

Conclusions

The results of this study underscore a notable issue of academic burnout among medical students within the surveyed cohort. The investigation reveals a significant correlation between Academic Burnout and IPAL, suggesting that incorporating IPAL strategies may be beneficial in addressing burnout in medical education settings. However, further research is needed to explore potential causal mechanisms.

Peer Review reports

Burnout, characterized by overwhelming mental and physical exhaustion, presents a critical concern within the medical student community. This phenomenon is strongly associated with reduced feelings of achievement and depersonalization, potentially leading to adverse student outcomes, such as poor academic performance, compromised mental health, increased dropout rates, and even suicidal ideation [ 1 ] [ 2 ]. A correlation between burnout and academic performance has been demonstrated, with burnout emerging as a negative predictor of academic achievement across various measures such as exams, grades, and GPA, reaffirming the importance of addressing burnout to safeguard students' academic success and overall health [ 3 ] [ 4 ].

The nine-item School Burnout Inventory (SBI-9) questionnaire supplies a standardized tool for assessing academic burnout (ABO), encompassing three key sub-scales: exhaustion (EX), cynicism (CY), and inadequacy (IN) [ 5 ]. These metrics, along with others, have been instrumental in shaping our understanding of burnout as a psychological syndrome [ 6 ] and have contributed to the International Classification of Diseases-11 definition, characterizing burnout as an occupational phenomenon resulting from chronic workplace stress that has not been effectively managed [ 7 ].

The prevalence of ABO among medical students has been on the rise, evidenced by a 6% increase in burnout levels in the United States from 2008 to 2014 [ 8 ], with estimates suggesting that half of all medical student's worldwide experience ABO even before entering residency [ 9 ]. Preliminary research conducted at the Universidad Central de Caribe (UCC) also showed elevated levels of ABO among its medical students [ 10 ].

Despite various support systems implemented by medical schools [ 11 ] [ 12 ] [ 13 ] effective strategies to mitigate ABO are still lacking. Recognizing that students and healthcare professionals experiencing burnout are more susceptible to unprofessional behavior, it is imperative to promote and supply effective support mechanisms to mitigate ABO [ 14 ] [ 15 ].

Recent research has shown that the learning environment significantly influences ABO rates among medical students, with lower learning environment scores correlating with higher burnout rates [ 16 ]. In this context, Peer-Assisted Learning (PAL), has been identified as effective strategies for enhancing student wellness [ 17 ] [ 18 ] [ 19 ] [ 20 ] [ 21 ]. PAL encompasses a spectrum of peer-to-peer educational activities, including near-peer assisted learning, where more experienced students guide their less-experienced counterpart [ 21 ]. This approach has been shown to foster essential skills such as problem-solving, critical thinking, and effective communication [ 11 ] [ 22 ] [ 23 ].

Informal PAL (IPAL), unlike its formal counterparts, develops organically through social networks and study groups among students, fostering a unique environment for collaborative learning and knowledge exchange without direct faculty or institutional oversight [ 24 ]. Although lacking a formal structure, IPAL offers opportunities for knowledge exchange and collaborative learning, contributing to students’ learning outcomes and overall academic success [ 25 ]. Additionally, it enhances students' self-efficacy, coping skills, and social support networks, all essential for academic success [ 26 ]. Research shows that peer learning improves students' comprehension of the subject matter and boosts their confidence in their roles [ 27 ].

While PAL is recognized for its various advantages in academic settings, there remains a gap in literature concerning its impact on students’ ABO. This lack of research highlights a crucial area of investigation, particularly in the high-pressure environment of medical education. Building upon this framework, our investigation is directed towards two primary objectives. Initially, we aim to estimate the ABO within our cohort of medical students and secondly, we seek to evaluate and elucidate the relationship between ABO and IPAL among these medical students. Guided by these aims, our research is driven by two primary questions: (1) Can the SBI-9 be considered a valid and reliable tool for assessing ABO in our context? and (2) What is the correlation between ABO and IPAL among medical students? By addressing these questions, our study aims to contribute to the broader understanding of strategies for mitigating burnout in medical education and offer evidence-based recommendations for promoting IPAL in medical education. Partial results from this study were presented at the December 2022 conference of the Medical Association of Puerto Rico [ 28 ].

Survey: measurement tools

We conducted a cross-sectional study using the nine-item School Burnout Inventory (SBI-9), administered online to assess Academic Burnout (ABO) among medical students [ 5 ]. Participation was voluntary, with students self-reporting their gender, age range, and academic standing. The Institutional Review Board (IRB) of the UCC approved the method and corresponding protocols (054–2022-25–06-IRB).

The SBI-9 was provided in both its original English form [ 5 ] and a Spanish-adapted version [ 29 ] to meet the bilingual needs of our university context (refer to Supplementary Material 1 ). We followed established standards for translating and adapting assessment instruments [ 30 ].

The SBI-9 questionnaire, which is freely available for research purposes, was chosen to assess ABO due to its strong psychometric properties and its comprehensive approach in university settings. The SBI-9 is specifically structured into three subscales: Exhaustion (EX) with four items, Cynicism (CY) with three items, and Inadequacy (IN) with two items. These sub-scales enable a nuanced examination of the several factors of ABO, assisting in the identification and reduction of potential confounding factors that contribute to student burnout.

Rating scale

Participants rated each SBI-9 item on a Likert scale from 1 (complete disagreement), 2 (disagree), 3 (neutral), 4 (agree) to 5 (complete agreement). In this instance, the purpose was to restrict the capacity to capture subtle nuances in students' opinions, opting instead for a concise representation on the five-scale value.

Measurement of informal peer-assisted learning

We evaluated IPAL engagement through a single item, asking students about the frequency of explaining concepts to peers during informal study sessions. In this study, we sought to assess participant’s engagement in IPAL to understand informal collaborative learning behaviors among medical students. In order to measure IPAL, a single question (in Spanish and English) regarding the frequency with which they explain concepts to their peers during their study sessions was included, expressed as “Aunque estudie solo(a) generalmente explico los conceptos a mis compañeros”; alternatively, "Although I study alone, I usually explain concepts to my colleagues” (Before the questionnaire was submitted, the students agreed that the word colleagues referred to their classmates). Responses were categorized as 'never' (NE) = 0, 'occasionally' (O) = 3, and 'frequently' (F) = 5. In interpreting the results, responses for the behavior of IPAL were grouped into two categories: those who indicated they 'never' (NE) engaged in the behavior and those who responded 'occasionally' or 'frequently' (O/F). This grouping strategy was implemented after consideration of the distribution of responses and nuances in students’ opinions.

Study sample

In January 2022, we conducted a cross-sectional study involving a study sample of 151 participants, representing 49.19% of the medical student population ( n  = 307) at the UCC in Puerto Rico. This sample size provides a study confidence level exceeding 90% with a 5% margin of error. Among these participants, 76 identified as female, 71 as male, and 4 did not specify their gender. The inclusion criteria encompassed medical students in their 1st to 4th year, aged 21 years or older. Additional demographic information alongside their corresponding ABO levels and parameters are detailed in Supplementary Material 2 .

ABO Calculations

The overall ABO calculation was carried out using the eight-item version of the SBI (SBI-8) [ 31 ], with high ABO defined as averages above 50%. For graphical analyses, data were aggregated and analyzed from the entire sample population, merging English and Spanish responses, and Likert scale values of each responder were converted into percentages, which were then averaged and statistically processed.

Statistical analysis

The process of establishing the factors influencing ABO involved several key steps.

We initiated our statistical approach with a Principal Component Analysis (PCA) to discern the main components contributing to ABO. Following PCA, we conducted Exploratory Factor Analysis (EFA) and validated our findings through Confirmatory Factor Analysis (CFA), referencing Gaussian Graphical Models [ 32 ] for additional insight (see Supplementary Material 3 ).

Cronbach's alpha was used to assess the internal consistency reliability of the scales, providing a measure of the extent to which all the items in the scale are correlated to each other. For validating the SBI in our medical student cohort, we adhered to Hu and Bentler's (1999) [ 33 ].

The EFA, performed using Jamovi for Windows, followed procedures modeled after Coşkun et al. (2023) [ 34 ]. We initially assessed data suitability for factor analysis by examining the correlation matrix and applying Bartlett’s Test of Sphericity alongside the Kaiser–Meyer–Olkin Measure of Sampling Adequacy (KMO MSA).

Our preliminary assessment evaluated the correlation matrix. Conducting factor analysis does not make sense if there is no correlation between items over 0.30 [ 35 ]. Correlation values (Spearman’s Rho) among items exceeded the threshold (except for item-EX3, that was excluded from the analysis), indicating adequacy for the EFA. In our case, we allow correlation greater than 0.2, although not very high, since it indicates that there is some relationship between the variables and, given the nature of the data, the inclusion of these variables in a factor analysis is justified by the Bartlett’s Test of Sphericity (χ2 = 430, p  < 0.001) and a satisfactory KMO MSA value of 0.815, confirming the dataset’s appropriateness for factor analysis. Both results showed that the data has no inadequacy to carry out factor analysis [ 35 ].

In determining the optimal number of factors, we employed three strategies: (a) Eigenvalue cut-off rule, (b) the “elbow” joint in the scree plot, and (c) fixed number. Direct Oblimin, an oblique rotation technique, was deemed suitable for our study given the norm of factor intercorrelation in social sciences studies [ 36 ]. We accepted 0.40 level as a factor loading threshold to consider that a factor is stable [ 37 ].

EFA identified significant factor loadings, with values for Factor-1 (EX) ranging from 0.30 (minimum acceptable) to 0.78, and for Factor-2 (CYIN) from 0.53 to 0.84. Subsequent PCA supported these findings, indicating component loadings from 0.43 to 0.88 for component 1, and 0.71 to 0.85 for component 2.

A two-factor model emerged from the EFA: Factor 1 encompassing EX and Factor-2 combining CY and IN (CYIN). CFA evaluated this model, with goodness-of-fit indices suggesting a well-fit model: CMIN/df 2.45, CFI (Comparative Fit Index) 0.93, TLI (Tucker‐Lewis Index) 0.90, RMSEA (Root Mean Square Error of Approximation) 0.098 (0.06–0.13), SRMR (Standardized Root Mean Square Residual) 0.05. Standardized regression weights varied between 0.42 and 0.72, affirming the model’s stability and relevance, evidenced by a Cronbach’s alpha coefficient of 0.828.

To visually present our findings regarding ABO, we used GraphPad Prism v.9. Additionally, we performed more analyses, including Pearson coefficient and Ordinary One-way ANOVA. For showing the Exploratory and Confirmatory Factor Analysis, as well as Multiple Correlation Comparisons and Path Model Mediation, we used Jamovi v2.3 with R subroutines (The Jamovi Project, 2022, https://www.jamovi.org ).

Student burnout inventory validation

The internal consistency of the SBI-9 was confirmed through a correlation matrix and Cronbach’s alpha, which revealed a high reliability coefficient of 0.913. PCA was conducted to identify the underlying structure within the data, choosing the most suitable model based on eigenvalues greater than 1 and a factor loading threshold of 0.4, using Oblimin rotation to facilitate interpretation. We explored the data with EFA to identify the underlying structure. This analysis revealed two loaded components: one composed of EX items (excluding EX3) and the other combining CY and IN items into a singular CYIN component. Both the significance of Bartlett's Test of sphericity (χ2 = 430, p  < 0.001) and the KMO MSA (0.825, range 0.788–0.855) confirmed the data’s suitability for factor analysis.

The exclusion of item-EX3 due to minimal correlation within the EX-subscale of ABO (Tables 1 and 2 ), and the fusion of CY with IN creating the Fc2, was further confirmed by the Gaussian Graphical Model (GGM) [ 32 ] (refer to Supplementary Material 3 ). This led to the adoption of the Puerto Rican version of the SBI, now referred to as SBI-8, with item-EX3 removed for subsequent analyses. In all instances, our results align with models (refer to Table  3 ), excluding item- EX3 who demonstrated elevated uniqueness (0.94; CI 0.91–0.97).

CFA further validated these findings, supporting the configuration of the two-factor model as most representative of our data (Table  3 ). This model, detailed in Table  3 , effectively captures the dimensions of ABO within our medical student cohort. Operating under this premise, we evaluated five models of the SBI-8, as delineated in Table  3 , to identify the model that most accurately aligns with our observed results.

The five models presented various configurations, as displayed in Table  3 , with Model M2 from the SBI-8 appearing as the most suitable. In Model M2, CY and IN items were combined as one factor (Fc2), while EX items formed another (Fc1). The analysis showed that ABO, as measured by the SBI-8 in model M2, proved the most robust statistical consistency. The CFA and reliability analysis yielded a Cronbach’s α of 0.927, signifying excellent internal consistency. The high KMO (measure of sample adequacy) value for Model M2 (> 0.82) confirmed excellent sample adequacy for all eight items. Model M2’s χ2, TLI, CFI, RMSEA, and SMRS, with a p -value < 0.001, showed a good fit to the data (refer to Table  3 ).

Academic burnout in medical students

The data collected from our survey, analyzed under the two-factor Model M2 derived from the SBI-8 (as depicted in Table  3 ), allows for precise categorization of ABO percentages among participants by academic year. The analysis revealed no statistically significant variation in ABO values across academic years, from first year (MS1) to fourth year (MS4) (refer to Fig.  1 ).

figure 1

The Academic Burnout per academic year of medical students. The ABO values for both MS1 and MS4 were lower compared to MS2 and MS3. Furthermore, the proportion of students with ABO scores above 50% in each year was as follows: 10/38 (26.32%), 29/56 (51.79%), 14/27 (51.85%), and 10/30 (33.33%) for MS1, MS2, MS3 and MS4, respectively. The ABO scores across four different medical student (MS) years, specifically from the 1st year (MS1) to the 4th year (MS4), along with their corresponding 95% confidence intervals (CI). The calculated ABO percentages, represented as mean percentage (standard deviation) and sample size, for each year were as follows: MS1, 41.34 (23.19) 38; MS2, 50.50 (17.44) 56; MS3, 51.75 (20.98) 27; and MS4, 39.54 (23.06) 30

From Fig.  1 the mean ABO percentages, standard deviation (in parenthesis), and the number of respondents and their percentage (in parenthesis) for each academic year respectively were as follows: MS1, 41.34 (SD 23.19) for 38 (25%); MS2, 50.50 (SD 17.44) for 56 (37%); MS3, 51.75 (SD 20.98) for 27 (18%) and MS4 39.54 (SD 23.06) for 30 (20%), from the sample population N  = 151.

The percentage of students with ABO values above 50% in each year is as follows: MS1, 26.32%, (10 out of 38 respondents); MS2, 51.79% (29 out of 56 respondents); MS3, 51.85% (14 out of 27 respondents); and for MS4, 33.33% (10 out of 30 respondents).

Our gender-based analysis showed no significant differences in ABO levels: males reported an average ABO of 44.76% (SD 19.16, n  = 71) and females 48.68% (SD 23.45, n  = 76). Similarly, language preferences—Spanish (47.31%, SD 21.82, n  = 111) or English (47.36%, SD 20.94, n  = 40)—did not significantly impact ABO scores. Additional demographic details are available in Supplementary Material 2 .

Four students who did not disclose their gender, showing an average ABO of 60.42% (SD 11.42, n  = 4), were excluded from the gender-specific analysis due to the small sample size.

Analysis of factors contributing to academic burnout in medical students

Reliability analysis for the SBI-8, assessed with Cronbach’s α-coefficient, showed high internal consistency (α = 0.829). Importantly, the analysis indicated CYIN-factor (Fc2) consistently showed lower values compared to the global EX-factor (Fc1), represented as an empty circle and square, respectively (Fig.  2 ). This difference was statistically significant ( p -value < 0.01), as illustrated in Fig.  2 (left). However, when comparing EX and CYIN percentages across medical school years, no distinct difference emerged between these two factors (see Fig.  2 , right).

figure 2

Factors contributing to Academic Burnout (ABO). Contribution of Factors Fc1 (EX) and Fc2 (CYIN) globally (left part) and per academic year right part. The figure shows average percentages, and 95% confidence intervals (CI) for the two factors. On the far left and with clear symbols are the overall percentages standard deviation and number of students. The percentage, standard deviation and number of values were obtained after excluding the EX3 item based on the final M2 model: Fc1 (EX) represented by circle 53.26 (22.40) N  = 151, Fc2 (CYIN), represented by square 43.88 (24.68) N  = 151. The global percentage between these two factors is statistically significant. On the right are represented with mean symbols half-full the percentages of the Fc1, EX for each year of study: triangle MS1, 51.75 (26.16) N  = 38; rhombus MS2, 59.081(9.22) N  = 56; circle MS3, 52.65 (19.48) N  = 22; and square MS4, 42.82 (23.44) N  = 29. Following in that order are the percentages of the Fc2, CYIN factor with fully filled symbols for each year of study: MS1, 42.60 (28.28) N  = 38; triangle, MS2, 46.21 (20.61) N  = 56; rhombus, MS3, 48.01 (28.15) N  = 22; circle and MS4 square 38.24 (26.83) N  = 28

Diminished academic burnout in medical students engaged in informal peer assisted learning

As depicted in Figs.  3 A, our results indicate that medical students engaged in IPAL experience lower levels of ABO compared to their peers who reported no engagement in tutoring their peers. Specifically, students reporting occasional or frequent engagement in IPAL (O/F) displayed an ABO score of 44.75% (SD 18.50) for 126 (83% of the respondents) students, lower than the 54.89% (SD 23.71) observed for the 25 (17% of the respondents) students who never engaged in IPAL (NE). This difference was statistically significant, with a p -value of 0.0133.

figure 3

A The academic burnout percent value of each medical student in the population’s offering peer-teaching (O/F) and those never do that (NE). The academic burnout percent value of each medical student (MS) in the population’s offering peer-learning (O/F) and those never do that (NE). The figure shows cumulative probability of the percentages of academic burnout (ABO) within the medical student population, specifically those who indicated that they never taught their peers—NE (Fill circles) and those who reported doing so frequently or occasionally—O/F do informal peer learning (clear circles). Results presented excluding the EX3 item based on the findings of the CFA (model M2). The O/F student group is shifted to the left, indicating a lower average ABO value. The mean percentage values, standard deviations (SD), and sample sizes for the O/F population were 44.75 (18.50) N  = 126, while for the NE population they were 54.89 (23.71) N  = 25. The O/F population had a statistically significant lower proportion of academic burnout compared to the NE population ( p  < 0.0133). B Factors Fc1 and Fc2 (EX and CYIN) involved in academic burnout and the relationship with students who taught their peers (O/F) and those of students who did not informally tutor their peers (NE). Factors (EX and CYIN) involved in academic burnout. In the left part, the figure shows the relationship of students who do informal peer learning (IPAL) to their peers (O/F) and those who did not do so (NE). When analyzing the students who do IPAL, the percentage of Fc2 is statistically lower p  < 0.001 compared to the Fc1: 41.98 (23.41) vs 52.25 (22.42) N  = 126. On the other hand, in students who do not take IPAL, there is no significant difference in the percentages of Fc2: 56.33 (30.65) vs Fc1: 58.33 (22.05) N  = 25. Values represent, the mean percentage values, standard deviations (SD), and sample sizes

Further examination of the two-factor Model M2, as presented in Fig.  3 B, highlights that the reduction in ABO among IPAL-participating students is particularly pronounced in the factor CYIN (Fc2), which was significantly lower than the EX-factor (Fc1) ( p -value < 0.001).

Figure  3 B delineates the detailed breakdown of these factors, comparing the percentages for each between students who engaged in IPAL O/F versus those who did not (NE). The results show that Fc1: EX for the O/F group was 52.25% (SD 22.42) for 126 (83%) of the respondents, lower than the NE group’s 58.33 (SD 22.05) for 25 (17%) of the respondents. Similarly, Fc2: CYIN for the O/F group was 41.98% (SD 23.41) for 126 (83%) of the respondents, less than the NE group’s 56.33% (SD 30.65) for 25 (17%) of the respondents.

Our findings validate the use of the School Burnout Inventory (SBI) for our sample. The validation process confirmed the SBI-8's alignment with an eight-item inventory (SBI-8), with two principal factors of ABO: EX and a combined measure of CY and IN (CYIN). Notably, this two-factor Model M2 (employing the SBI-8) emerged as the most proper (Table  3 ), consistent with findings from other studies using the SBI-9 and SBI-8 [ 38 ] [ 39 ]. The validated model underscores the interrelated nature of CY and IN, suggesting common underlying issues, such as a lack of support or resources at school, or a mismatch between students’ skills and academic demands. This model has implications for interventions aimed at reducing burnout, as addressing one factor may help alleviate the other. For example, interventions that aim to improve students' skills and resources, or to better match students with their academic jobs, could potentially alleviate both CY and feelings of IN. Noticeably, this two-factor model supplies a simplified and potentially more actionable framework for understanding and addressing ABO among medical students. However, further research is needed to fully understand ABO and find the most effective interventions for alleviating it.

The prevalence of ABO in our medical school mirrors levels reported in medical schools across the United States [ 1 ] [ 40 ]. Despite our school’s abundance of support resources and emphasis on the availability of help, the persistent ABO underscores a notable issue of ABO among medical students within the surveyed cohort. This pattern is not unique to our institution but reflects a broader challenge faced by many educational institutions [ 11 ] [ 41 ] [ 42 ].

Our study introduces a unique perspective by delving into the role of IPAL on the experiences of ABO among medical students, offering valuable insights into this critical issue. The pivotal finding is the significant ( p -value < 0.013) decrease in ABO levels among medical students who engage in IPAL, compared to those who do not (Fig.  3 A), from 44.75% (SD 18.50) for IPAL engaged students versus 54.89% (SD 23.71) for those who never engaged in such practices. Moreover, our analysis reveals that medical students engaged in IPAL show a significant reduction ( p  < 0.001) in the combined levels of CY and IN (O/F-CYIN) compared to EX (O/F-EX), as illustrated in Fig.  3 B. This translates into a significant ( p  < 0.001) reduction in ABO among students participating in IPAL (O/F—IPAL) compared to those do not participate at all (NE—IPAL). This finding suggests the potential of IPAL as mitigating factor against ABO in our academic environment.

Furthermore, our findings suggest that the factors Fc2 (CYIN) and Fc1 (EX) are linked to increased ABO levels in students who reported never (NE) taking part in IPAL (Fig.  3 B). While the specific mechanisms behind this association were not the focus of our initial study, the observed correlation prompts a deeper investigation. The fact that students with lower ABO levels may be more predisposed to engage in IPAL raises questions about the direction of this relationship. Given the significance of this finding, further detailed studies are called for to understand the causality behind these dynamics.

Preliminary analyses, as outlined in Supplementary Material 4 , show that IPAL directly reduces ABO, particularly by diminishing the levels of the CYIN (or Fc2) aspect rather than through a mediating effect on overall ABO. This effect contrasts with a common assumption about mediating factors: instead of indirectly affecting overall ABO through different paths, IPAL directly targets and reduces the specific elements of CY and IN. The statistical significance of IPAL’s direct impact on CYIN suggests that its effect is not due to random chance. Therefore, we recommend that interventions aiming to reduce ABO should prioritize IPAL, focusing specifically on lowering CY and IN (Fc2). Further examination reveals that while IPAL significantly affects the CYIN component of ABO, its influence on the EX-component (Fc1) is minimal or non-existent (refer to Supplementary Material 4 ), which suggests IPAL's benefits may be more psychological and social than physical or emotional. This distinction is critical because it adds insights into potential strategies to mitigate ABO levels in medical students. Therefore, further research is needed to develop a comprehensive understanding of ABO and how IPAL can play a role in its alleviation. [ 1 ] [ 9 ]. While many studies have shed light on factors that mitigate ABO, none has specifically discussed the impact of peer learning on ABO.

Our findings demonstrate that students doing IPAL either occasionally or frequently (O/F) exhibit significantly lower levels of CYIN when compared to their levels of EX. This distinction underlines the potential of IPAL as a targeted strategy to address specific components of ABO. However, earlier studies have highlighted the dynamic nature of peer learning, that a student's enthusiasm for and engagement in peer learning can vary over time [ 19 ] [ 43 ] [ 44 ], which could impact the effectiveness of IPAL. Through regular IPAL assessments, it could be possible to proactively show and address these fluctuations, implementing the right interventions to sustain their benefits. By fostering a supportive community that encourages collaboration, IPAL has the potential to significantly reduce ABO. This, in turn, enhances learning efficiency and helps students develop effective coping strategies, thus addressing the multifaceted nature of ABO by offering psychological, social, and academic support [ 17 , 18 , 19 , 20 , 21 ], [ 45 , 46 , 47 ].

Limitations

Our study has several limitations. First, due to its cross-sectional design, it lacks a control group, limiting our capability to make temporal comparisons concerning ABO rates and other aspects of medical students’ well-being throughout their careers. Future studies should consider longitudinal designs to enable more effective comparisons over time.

Second, our study encountered limited medical student participation, with a 49.19% (151 out of 307 medical students), which introduces the potential for response rate bias. This bias may affect the results if, for example, students experiencing higher levels of distress were either less likely or more likely to participate, due to the subject matter's pertinence. However, such patterns were not evident in our analysis.

Third, our research was conducted at a single medical school, restricting the generalizability of our findings to the broader medical student population in Puerto Rico.

Lastly, the nature of our questionnaire limited our ability to collect comprehensive psychological and personal data from the students, thus narrowing the study’s overall depth. Future studies should consider exploring a broader array of factors, such as studying conditions, to provide a more holistic understanding of the ABO experiences among medical students.

Our research presents compelling evidence of a widespread ABO issue among medical students in our study population, with observed levels alarmingly aligning with trends seen in medical schools throughout the United States. This issue underscores an urgent need for immediate and targeted intervention strategies to mitigate these ABO levels.

In addressing our first research question, our findings confirm that the SBI, particularly the SBI-8, serves as a valid and reliable instrument for assessing ABO in the context of our study. This validation offers a foundation for accurately measuring ABO levels among medical students.

Turning to our second research question, the data reveals a significant correlation between ABO and IPAL. Our data indicates that students engaged in IPAL, whether occasionally or frequently, exhibit notably lower levels of cynicism and inadequacy, two critical dimensions of ABO. This finding not only reaffirms the value of IPAL as an academic practice but also positions it as a viable method for reducing elements of ABO among medical students. Given this correlation, we advocate for the promotion of IPAL within medical curricula as a proactive approach to reduce ABO.

Availability of data materials

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Abbreviations

Academic Burnout

Confirmatory Factor Analysis

Cynicism/Inadequacy

Exploratory Factor Analysis

Factor 1, Factor 2

Gaussian Graphical Model

Informal Peer Assisted Learning

Medical students’ year 1, 2, 3 and 4, respectively

Occasionally / Frequently

Principal Component Analysis

School Burnout Inventory -8 items

School Burnout Inventory -9 items

Dyrbye LN, Thomas MR, Shanafelt TD. Systematic review of depression, anxiety, and other indicators of psychological distress among US and Canadian medical students. Acad Med. 2006;81(4):354–73. https://doi.org/10.1097/00001888-200604000-00009 .

Article   Google Scholar  

Edú-Valsania S, Laguía A, Moriano JA. Burnout: A Review of Theory and Measurement. Int J Environ Res Public Health. 2022;19(3):1780. https://doi.org/10.3390/ijerph19031780 .

Ilić IM, Ilić IM, Arandjelović MŽ, Jovanović JM, Nešić MM. Relationships of work related psychosocial risks, stress, individual factors and burnout Questionnaire survey among emergency physicians and nurses. Medycyna Pracy. 2017;68(2):167–78. https://doi.org/10.13075/mp.5893.00516 .

Madigan DJ, Olsson LF, Hill AP, Curran T. Athlete Burnout Symptoms Are Increasing: A Cross-Temporal Meta-Analysis of Average Levels From 1997 to 2019. J Sport Exerc Psychol. 2022;44(3):153–68. https://doi.org/10.1123/jsep.2020-0291 .

Salmela-Aro K, Kiuru N, Leskinen E, Nurmi J-E. School Burnout Inventory (SBI). Eur J Psychol Assess. 2009;25(1):48–57. https://doi.org/10.1027/1015-5759.25.1.48 .

Maslach C, Leiter MP. Understanding the burnout experience: Recent research and its implications for psychiatry. World Psychiatry. 2016;15(2):103. https://.

World Health Organization. Burn-out an "occupational phenomenon": International Classification of Diseases. 2019 [cited 2023 Sep 20]. Available from: https://www.who.int/news/item/28-05-2019-burn-out-an-occupational-phenomenon-international-classification-of-diseases

Ofei-Dodoo S, Moser SE, Kellerman R, Wipperman J, Paolo A. Burnout and Other Types of Emotional Distress Among Medical Students. Medical Science Educator. 2019;29(4):1061–9. https://doi.org/10.1007/s40670-019-00810-5 .

Frajerman A, Morvan Y, Krebs M-O, Gorwood P, Chaumette B. Burnout in medical students before residency: A systematic review and meta-analysis. European Psychiatry: The Journal of the Association of European Psychiatrists. 2019;55:36–42. https://doi.org/10.1016/j.eurpsy.2018.08.006 .

Vázquez K, Rivera N, Zorrilla F, Piñeiro Z, Rojas LV. Academic Burnout among female medical students during their pre-clinical years in Puerto Rico. Philadelphia, USA: Poster presented at: 103rd American Medical Women’s Association (AMWA) Annual Meeting; 2018.

Google Scholar  

Klein, H. J., & McCarthy, S. M. Student wellness trends and interventions in medical education: a narrative review. Humanities and Social Sciences Communications. 2022;9(1) Article 1. https://doi.org/10.1057/s41599-022-01105-8 .

Popa-Velea O, Diaconescu L, Mihăilescu A, Jidveian Popescu M, Macarie G. Burnout and Its Relationships with Alexithymia, Stress, and Social Support among Romanian Medical Students: A Cross-Sectional Study. Int J Environ Res Public Health. 2017;14(6):560. https://doi.org/10.3390/ijerph14060560 .

Silva V, Costa P, Pereira I, Faria R, Salgueira AP, Costa, et al. Depression in medical students: Insights from a longitudinal study. BMC Med Educ. 2017;17(1):184. https://doi.org/10.1186/s12909-017-1006-0 .

Ishak WW, Lederer S, Mandili C, Nikravesh R, Seligman L, Vasa M, et al. Burnout during residency training: A literature review. J Grad Med Educ. 2009;1(2):236–42. https://doi.org/10.4300/JGME-D-09-00054.1 .

Wood DF. Mens sana in corpore sano: Student well-being and the development of resilience. Med Educ. 2016;50(1):20–3. https://doi.org/10.1111/medu.12934 .

O’Marr JM, Chan SM, Crawford L, Wong AH, Samuels E, Boatright D. Perceptions on Burnout and the Medical School Learning Environment of Medical Students Who Are Underrepresented in Medicine. JAMA Netw Open. 2022;5(2):e220115. https://doi.org/10.1001/jamanetworkopen.2022.0115 .

Avonts M, Bombeke K, Michels NR, Vanderveken OM, De Winter BY. How can peer teaching influence the development of medical students? A descriptive, longitudinal interview study. BMC Med Educ. 2023;23(1):861. https://doi.org/10.1186/s12909-023-04801-4 .

de Menezes S, Premnath D. Near-peer education: A novel teaching program. Int J Med Educ. 2016;7:160–7. https://doi.org/10.5116/ijme.5738.3c28 .

Hall S, Harrison CH, Stephens J, Andrade MG, Seaby EG, Parton W, McElligott S, Myers MA, Elmansouri A, Ahn M, Parrott R, Smith CF, Border S. The benefits of being a near-peer teacher. Clin Teach. 2018;15(5):403–7. https://doi.org/10.1111/tct.12784 .

Janzen K, Latiolais CA, Nguyen K, Dinh A, Giang D, Langas, et al. Impact of a near-peer teaching program within a college of pharmacy on interest in mentoring roles. Curr Pharm Teach Learn. 2023;15(4):408–13. https://doi.org/10.1016/j.cptl.2023.04.008 .

Olaussen A, Reddy P, Irvine S, Williams B. Peer-assisted learning: Time for nomenclature clarification. Med Educ Online. 2016;21(1):30974. https://doi.org/10.3402/meo.v21.30974 .

Topping KJ. Peer Education and Peer Counselling for Health and Well-Being: A Review of Reviews. Int J Environ Res Public Health. 2022;19(10):6064. https://doi.org/10.3390/ijerph19106064 .

Shenoy A, Petersen KH. Peer Tutoring in Preclinical Medical Education: A Review of the Literature. Medical Science Educator. 2019;30(1):537–44. https://doi.org/10.1007/s40670-019-00895-y .

Morris, T. J., Collins, S., & Hart, J. (n.d.). Informal peer-assisted learning amongst medical students: A qualitative perspective. The Clinical Teacher, n/a(n/a), e13721. https://doi.org/10.1111/tct.13721

Bowyer ER, Shaw SCK. Informal Near-Peer Teaching in Medical Education: A Scoping Review. Education for Health. 2021;34(1):29. https://doi.org/10.4103/efh.EfH_20_18 .

Tai-Seale M, Dillon EC, Yang Y, Nordgren R, Steinberg RL, Nauenberg T, et al. Physicians’ Well-Being Linked To In-Basket Messages Generated By Algorithms In Electronic Health Records. Health Affairs (Project Hope). 2019;38(7):1073–8. https://doi.org/10.1377/hlthaff.2018.05509 .

Bulte C, Betts A, Garner K, Durning S. Student teaching: Views of student near-peer teachers and learners. Med Teach. 2007;29(6):583–90. https://doi.org/10.1080/01421590701583824 .

Campillo P, Ramírez F, Rojas LV. Burnout in UCC’s Medical Students: Implications in Collaborative Learning. San Juan, PR: Paster presented at: 21st Annual Convention of College of Physicians Surgeons of Puerto Rico. 2022.

Boada-Grau, J., Merino-Tejedor, E., Sánchez-García, J.-C., Prizmic-Kuzmica, A.-J., & Vigil-Colet, A. Adaptation and psychometric properties of the SBI-U scale for Academic Burnout in university students. Anales de Psicología / Annals of Psychology. 2015;31(1), Article 1. https://doi.org/10.6018/analesps.31.1.168581

Muñiz J, Bartram D. Improving International Tests and Testing. Eur Psychol. 2007;12(3):206–19. https://doi.org/10.1027/1016-9040.12.3.206 .

Carmona-Halty M, Mena-Chamorro P, Sepúlveda-Páez G, Ferrer-Urbina R. School Burnout Inventory: Factorial Validity, Reliability, and Measurement Invariance in a Chilean Sample of High School Students. Front Psychol. 2022;12:774703. https://doi.org/10.3389/fpsyg.2021.774703 .

Bhushan, N., Mohnert, F., Sloot, D., Jans, L., Albers, C., & Steg, L. Using a Gaussian Graphical Model to Explore Relationships Between Items and Variables in Environmental Psychology Research. Frontiers in Psychology. 2019;10. https://www.frontiersin.org/articles/10.3389/fpsyg.2019.01050 .

Hu L, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Struct Equ Modeling. 1999;6(1):1–55. https://doi.org/10.1080/10705519909540118 .

Coşkun Ö, Timurçin U, Kıyak YS, Budakoğlu Iİ. Validation of IFMSA social accountability assessment tool: exploratory and confirmatory factor analysis. BMC Med Educ. 2023;1(23):138. https://doi.org/10.1186/s12909-023-04121-7 .

Tabachnick BG, Fidell LS. Using multivariate statistics. 6th edition. Upper Saddle River (NJ): Pearson; 7Ed. 2019. https://www.pearson.com/ .

Costello AB, Osborne J. Best practices in exploratory factor analysis: four recommendations for getting the most from your analysis. 2005;10:1–9. https://doi.org/10.7275/JYJ1-4868 .

Guadagnoli E, Velicer WF. Relation of sample size to the stability of component patterns. Psychol Bull. 1988;103:265–75. https://doi.org/10.1037/0033-2909.103.2.265 .

Salmela-Aro K, Upadyaya K. School burnout and engagement in the context of demands-resources model. Br J Educ Psychol. 2014;84(Pt 1):137–51. https://doi.org/10.1111/bjep.12018 .

Hoferichter F, Raufelder D, Schweder S, Salmela-Aro K. Validation and Reliability of the German Version of the School Burnout Inventory. Z Entwicklungspsychol Padagog Psychol. 2022;54(1):1–14.

Jordan RK, Shah SS, Desai H, Tripi J, Mitchell A, Worth RG. Variation of stress levels, burnout, and resilience throughout the academic year in first-year medical students. PLoS ONE. 2020;15(10):e0240667. https://doi.org/10.1371/journal.pone.0240667 .

Dewey C, Hingle S, Goelz E, Linzer M. Supporting Clinicians During the COVID-19 Pandemic. Ann Intern Med. 2020;172(11):752–3. https://doi.org/10.7326/M20-1033 .

Horowitz CR, Suchman AL, Branch WT, Frankel RM. What do doctors find meaningful about their work? Ann Intern Med. 2003;138(9):772–5. https://doi.org/10.7326/0003-4819-138-9-200305060-00028 .

Bugaj TJ, Blohm M, Schmid C, Koehl N, Huber J, Huhn D, Herzog W, Krautter M, Nikendei C, et al. Peer-assisted learning (PAL): Skills lab tutors’ experiences and motivation. BMC Med Educ. 2019;19(1):353. https://doi.org/10.1186/s12909-019-1760-2 .

Giuliodori MJ, Lujan HL, DiCarlo SE. Peer instruction enhanced student performance on qualitative problem-solving questions. Adv Physiol Educ. 2006;30(4):168–73. https://doi.org/10.1152/advan.00013.2006 .

Ten Cate O, Durning S. Peer teaching in medical education: Twelve reasons to move from theory to practice. Med Teach. 2007;29(6):591–9. https://doi.org/10.1080/01421590701606799 .

Loda T, Erschens R, Nikendei C, Zipfel S, Herrmann-Werner A. Qualitative analysis of cognitive and social congruence in peer-assisted learning—The perspectives of medical students, student tutors and lecturers. Med Educ Online. 2020;25(1):1801306. https://doi.org/10.1080/10872981.2020.1801306 .

Tamachi S, Giles JA, Dornan T, Hill EJR. “You understand that whole big situation they’re in”: Interpretative phenomenological analysis of peer-assisted learning. BMC Med Educ. 2018;18(1):197. https://doi.org/10.1186/s12909-018-1291-2 .

Download references

Acknowledgements

The authors thank all the students for answering the questionnaires. Thanks to Elisa Ramos-Vásquez for reading the manuscript and her suggestions for statistical analysis. The publication cost of this research was supported by the National Institute of General Medical Sciences (NIGMS) of the National Institutes of Health under award number U54GM133807. The content is solely responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Conflict of interest statement

The authors declare that the research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest.

This research received no internal or external funding.

Author information

Paola Campillo, Frances Ramírez de Arellano and Isabel C. Gómez these authors contributed equally to this work.

Authors and Affiliations

School of Medicine, Universidad Central del Caribe, Bayamón, Puerto Rico, USA

Paola Campillo & Frances Ramírez de Arellano

Cellular-Molecular Biology Dept, University of Puerto Rico (RP), San Juan, Puerto Rico, USA

Isabel C. Gómez

Interdisciplinary Sciences Dept, University of Puerto Rico (RP), San Juan, Puerto Rico, USA

Natalia Jiménez

Education Sciences and Psychology Dept, Universitat Rovira I Virgili, Av. Catalunya, 35, 43002, Tarragona, Spain

Joan Boada-Grau

Physiology Dept. School of Medicine, Universidad Central del Caribe, 100 Av. Laurel, Bayamón, Puerto Rico, 00956, USA

Legier V. Rojas

You can also search for this author in PubMed   Google Scholar

Contributions

Project Conceptualization: PC, FR, LVR; Intervention Design: PC, FR, ICG, LVR. Supervision and Oversight: ICG, PC, JB-G, LVR; Data Curation: NJ, LVR; Data Analysis: NJ, ICG, LVR. Manuscript Drafting: ICG, PC, LVR. Writing the main manuscript text: ICG, LVR. Preparation of Figures: NJ, ICG, LVR. Manuscript Revisions: ICG, PC, FR, NJ, JB, JB-G, LVR. Final Approval for Submission: ICG, PC, NJ, FR, JB, JB-G, LVR. All authors agree to be accountable for all aspects of the work. All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis, and interpretation, or in all these areas; took part in drafting, revising, or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work. PC, FR and ICG share the first authorship in this work. All authors approved the last version and agreed to be accountable for all aspects of the final product.

Corresponding author

Correspondence to Legier V. Rojas .

Ethics declarations

Ethics approval and consent to participate.

The methodology and corresponding protocols received approval from the Institutional Review Board (IRB) of the UCC (054–2022-25–06-IRB). Each participant was completely informed about the study protocol and supplied a written and informed consent form before taking part in the study.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Campillo, P., de Arellano, F.R., Gómez, I.C. et al. Addressing medical student burnout through informal peer-assisted learning: a correlational analysis. BMC Med Educ 24 , 460 (2024). https://doi.org/10.1186/s12909-024-05419-w

Download citation

Received : 30 November 2023

Accepted : 11 April 2024

Published : 26 April 2024

DOI : https://doi.org/10.1186/s12909-024-05419-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Medical students
  • Academic burnout
  • Peer assisted learning
  • Informal peer assisted learning
  • School burnout inventory

BMC Medical Education

ISSN: 1472-6920

factor analysis in research methodology

The overall stability of a partially unstable reservoir bank slope to water fluctuation and rainfall based on Bayesian theory

  • Technical Note
  • Published: 25 April 2024

Cite this article

factor analysis in research methodology

  • Wengang Zhang 1 , 2 , 3 , 5 , 6 ,
  • Songlin Liu 1 ,
  • Luqi Wang   ORCID: orcid.org/0000-0001-5108-250X 1 , 2 , 3 ,
  • Weixing Sun 1 ,
  • Yuwei He 1 ,
  • Yankun Wang 4 &
  • Guanhua Sun 6  

22 Accesses

Explore all metrics

In geotechnical analysis, the factor of safety (FOS) is crucial for slope stability assessment. Traditional methods often overlook the nuances of partial slope instability. Accurately determining geotechnical parameters for FOS in complex simulations is challenging and resource-intensive. The limit equilibrium method (LEM), considering unit weight, cohesion, and internal friction angle, is used to address this. This study focuses on the Jiuxianping landslide, analyzing its stability and failure behavior. Utilizing the Bayesian theorem, the study back-analyzes shear strength parameters, considering partial instability and uncertainties in the Janbu corrected method. The parameters’ posterior distribution is determined using the Markov Chain Monte Carlo (MCMC) method and Multivariate Adaptive Regression Splines (MARS) for efficient sampling. These parameters are then used for precise FOS calculation at the critical point of partial instability, corroborated by 2021 data from the Jiuxianping landslide. The study finds that while the entire slope remains stable, partial instability caused by long-term water erosion significantly lowers the FOS about 10.9%, highlighting its critical impact on overall slope stability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

factor analysis in research methodology

Data availability

Data available on request from the authors.

Abramson LW, Lee TS, Sharma S, Boyce GM et al (2002) Slope stability and stabilization methods. Wiley, New York

Google Scholar  

Chakraborti S, Konietzky H, Walter K (2012) A comparative study of different approaches for factor of safety calculations by shear strength reduction technique for non-linear Hoek-Brown failure criterion. Geotech Geol Eng 30(4):925–934

Article   Google Scholar  

Duncan J, Wright S (1980) The accuracy of equilibrium methods of slope stability analysis. Eng Geol 16(1–2):5–17

Ering P, Babu GLS (2016) Probabilistic back analysis of rainfall induced landslide-a case study of malin landslide, India. Eng Geol 208:154–164

Falae PO, Agarwal E, Pain A, Dash RK, Kanungo DP (2021) A data driven efficient framework for the probabilistic slope stability analysis of Pakhi landslide. Garhwal Himalaya J Earth Syst Sci 130(3):167

Friedman JH (1991) Multivariate adaptive regression splines. The Annals of Statistics 19(1)

Ghaderi A, Shahri AA, Larsson S (2022) A visualized hybrid intelligent model to delineate Swedish fine-grained soil layers using clay sensitivity. CATENA 214(7):106289

He CC, Hu XL, Tannant DD, Tan FL, Zhang YM, Zhang H (2018) Response of a landslide to reservoir impoundment in model tests. Eng Geol 247:84–93

He XZ, Xu HD, Sheng DS (2023) Ready-to-use deep-learning surrogate models for problems with spatially variable inputs and outputs. Acta Geotech 18:1681–1698

Hoffman MD, Gelman A (2014) The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. arXiv (Cornell University) 15(1):1593–1623

Huang D, Gu DM, Song YX, Cen DF, Zeng B (2018) Towards a complete understanding of the triggering mechanism of a large reactivated landslide in the Three Gorges Reservoir. Eng Geol 238:36–51

Huang ML, Sun DA, Wang CH, Keleta Y (2020) Reliability analysis of unsaturated soil slope stability using spatial random field-based Bayesian method. Landslides 18(3):1177–1189

Ivanescu AE, Li P, George B, Brown AW, Keith SW, Raju D, Allison DB (2015) The importance of prediction model validation and assessment in obesity and nutrition research. Int J Obesity 40(6):887–894

Jian WX, Xu Q, Yang HF, Wang FW (2014) Mechanism and failure process of Qianjiangping landslide in the Three Gorges Reservoir. China Environ Earth Sci 72(8):2999–3013

Jiang JW, Ehret D, Xiang W, Rohn J, Huang L, Yan SJ, Bi RN (2010) Numerical simulation of Qiaotou landslide deformation caused by drawdown of the Three Gorges Reservoir. China Environ Earth Sci 62(2):411–419

Jiang SH, Wang L, Ouyang S, Huang JS, Liu Y (2021) A comparative study of Bayesian inverse analyses of spatially varying soil parameters for slope reliability updating. Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards 16(4):746–765

Kelly R, Huang JS (2015) Bayesian updating for one-dimensional consolidation measurements. Can Geotech J 52(9):1318–1330

Li YR, Wen BP, Aydin A (2013) Ring shear tests on slip zone soils of three giant landslides in the Three Gorges Project area. Eng Geol 154:106–115

Li SJ, Zhao HB, Ru ZL, Sun QC (2016) Probabilistic back analysis based on Bayesian and multi-output support vector machine for a high cut rock slope. Eng Geol 203(3):178–190

Pan M, Jiang SH, Liu X, Song GQ, Huang JS (2023) Sequential probabilistic back analyses of spatially varying soil parameters and slope reliability prediction under rainfall. Eng Geol 328:107372

Shahri AA, Malehmir A, Juhlin C (2015) Soil classification analysis based on piezocone penetration test data — a case study from a quick-clay landslide site in Southwestern Sweden. Eng Geol 189(4):32–47

Si XF, Luo Y, Gong FQ, Huang JC, Han KF (2024) Temperature effect of rockburst in granite caverns: insights from reduced-scale model true-triaxial test. Geomech Geophys Geo-energ Geo-resour 10:26

Song K, Wang FW, Yi QL, Lu SQ (2018) Landslide deformation behavior influenced by water level fluctuations of the Three Gorges Reservoir (China). Eng Geol 247:58–68

Sun DL, Xu JH, Wen HJ (2021) Assessment of landslide susceptibility mapping based on Bayesian hyperparameter optimization: a comparison between logistic regression and random forest. Eng Geol 281:105972

Tang HM, Wasowski J, Juang CH (2019) Geohazards in the three Gorges Reservoir Area, China – Lessons learned from decades of research. Eng Geol 261:105267

Wang L, Hwang JH, Luo Z, Juang CH, Xiao JH (2013) Probabilistic back analysis of slope failure – a case study in Taiwan. Comput Geotech 51:12–23

Wang YK, Huang JS, Tang HM, Zeng C (2020) Bayesian back analysis of landslides considering slip surface uncertainty. Landslides 17(9):2125–2136

Wang LQ, Wang L, Zhang WG, Meng XY, Liu SL, Zhu C (2024) Time series prediction of reservoir bank landslide failure probability considering the spatial variability of soil properties. J Rock Mech Eng. https://doi.org/10.1016/j.jrmge.2023.11.040

Xu WH, Kang YF, Chen LC, Wang LQ, Qin CB, Zhang LT, Liang D, Wu CZ, Zhang WG (2022) Dynamic assessment of slope stability based on multi-source monitoring data and ensemble learning approaches: a case study of Jiuxianping landslide. Geol J 58(6):2353–2371

Xue Y, Miao FS, Wu YP, Dias D, Li LW (2023) Combing soil spatial variation and weakening of the groundwater fluctuation zone for the probabilistic stability analysis of a riverside landslide in the Three Gorges Reservoir Area. Landslides 20(5):1013–1029

Yang GH, Zhang YC, Ye J, Zhong ZH (2012) Identification of landslide type and determination of optimal reinforcement site based on stress field and displacement field. Chi J Rock Mech Eng 31(9):1879–1887

Zhang LL, Zhang J, Zhang LM, Tang WH (2010) Back analysis of slope failure with Markov chain Monte Carlo simulation. Comput Geotech 37(7–8):905–912

Zhang WG, Zhang RH, Wang W, Zhong F, Goh A (2019) A multivariate adaptive regression splines model for determining horizontal wall deflection envelope for braced excavations in clays. Tunn Undergr Sp Tech 84:461–471

Zhang WG, Tang LB, Wang LHR, L, Cheng LF, Zhou TQ, Chen X, (2020a) Probabilistic stability analysis of Bazimen landslide with monitored rainfall data and water level fluctuations in Three Gorges Reservoir. China Front Struct Civ Eng 14(5):1247–1261

Zhang WG, Zhang RH, Wu CZ, Goh ATC, Lacasse S, Liu ZQ, Liu HL (2020b) State-of-the-art review of soft computing applications in underground excavations. Geosci Front 11(4):1095–1106

Zhang YG, Zhang Z, Xue S, Wang RJ, Xiao M (2020c) Stability analysis of a typical landslide mass in the Three Gorges Reservoir under varying reservoir water levels. Environ Earth Sci 79(1):42

Zhang WG, Lin SC, Wang LQ, Wang L, Jiang X, Wang S (2024) A novel creep contact model for rock and its implement in discrete element simulation. Comput Geoch 167:106054

Zhao LH, Yang F, Zhang YB, Dan HC, Liu WZ (2015a) Effects of shear strength reduction strategies on safety factor of homogeneous slope based on a general nonlinear failure criterion. Comput Geotech 63:215–228

Zhao LH, Zuo SZ, Lin YL, Li L, Zhang YB (2015b) Reliability back analysis of shear strength parameters of landslide with three-dimensional upper bound limit analysis theory. Landslides 13(4):711–724

Zhou C, Hu YJ, Xiao T, Ou Q, Wang LQ (2023) Analytical model for reinforcement effect and load transfer of pre-stressed anchor cable with bore deviation. Constr Build Mater 379:131219

Zhu H, Griffiths DV, Fenton GA, Zhang LM (2015) Undrained failure mechanisms of slopes in random soil. Eng Geol 191:31–35

Download references

The authors are grateful to the financial supports from Cooperation projects between Chongqing University, Chinese Academy of Sciences and other institutes (HZ2021001), and Transportation Science and Technology project of Sichuan Province (2018-ZL-01).

Author information

Authors and affiliations.

School of Civil Engineering, Chongqing University, Chongqing, 400045, China

Wengang Zhang, Songlin Liu, Luqi Wang, Weixing Sun & Yuwei He

Key Laboratory of New Technology for Construction of Cities in Mountain Area, Ministry of Education, Chongqing University, Chongqing, 400045, China

Wengang Zhang & Luqi Wang

National Joint Engineering Research Center of Geohazards Prevention in the Reservoir Areas, Chongqing University, Chongqing, 400045, China

School of Geosciences, Yangtze University, Wuhan, 430100, China

Yankun Wang

Sichuan Yanjiang Panning Expressway Co., Ltd, Xichang, 615000, China

Wengang Zhang

Institute of Rock and Soil Mechanics, Chinese Academy of Sciences, Wuhan, 430071, China

Wengang Zhang & Guanhua Sun

You can also search for this author in PubMed   Google Scholar

Contributions

Methodology, SL; validation, LW; software, YH; resources, SW; supervision, WZ and GS. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Luqi Wang .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Overall stability of a partially unstable reservoir slope based on Bayesian theory.

Rights and permissions

Reprints and permissions

About this article

Zhang, W., Liu, S., Wang, L. et al. The overall stability of a partially unstable reservoir bank slope to water fluctuation and rainfall based on Bayesian theory. Landslides (2024). https://doi.org/10.1007/s10346-024-02250-8

Download citation

Received : 11 April 2023

Accepted : 14 March 2024

Published : 25 April 2024

DOI : https://doi.org/10.1007/s10346-024-02250-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Jiuxianping landslide stability analysis
  • Bayesian theory
  • Multivariate Adaptive Regression Splines
  • Markov Chain Monte Carlo (MCMC) method
  • Find a journal
  • Publish with us
  • Track your research

ORIGINAL RESEARCH article

This article is part of the research topic.

Monitoring, Early Warning and Mitigation of Natural and Engineered Slopes – Volume IV

Slope deformation prediction based on noise reduction and deep learning: a point prediction and probability analysis method Provisionally Accepted

  • 1 Hunan Provincial Communications Planning, Survey and Design Institute Co., Ltd, China
  • 2 Hunan Water Planning and Design Institute Co., Ltd, China

The final, formatted version of the article will be published soon.

Slope deformation, a key factor affecting slope stability, has complexity and uncertainty. It is crucial for early warning of slope instability disasters to master the future development law of slope deformation. In this paper, a model for point prediction and probability analysis of slope deformation based on DeepAR deep learning algorithm is proposed.In addition, considering the noise problem of slope measurement data, a Gaussian-filter (GF) algorithm is used to reduce the noise of the data, and the final prediction model is the hybrid GF-DeepAR model. Firstly, the noise reduction effect of the GF algorithm is analyzed relying on two actual slope engineering cases, and the DeepAR point prediction based on the original data is also compared with the GF-DeepAR prediction based on the noise reduction data. Secondly, to verify the point prediction performance of the proposed model, it is compared with three typical point prediction models, namely, GF-LSTM, GF-XGBoost, and GF-SVR. Finally, a probability analysis framework for slope deformation is proposed based on the DeepAR algorithm characteristics, and the probability prediction performance of the GF-DeepAR model is compared with that of the GF-GPR and GF-LSTMQR models to further validate the superiority of the GF-DeepAR model. The results of the study show that: (1) The best noise reduction is achieved at the C1 and D2 sites with a standard deviation σ of 0.5.(2) A comparison before and after noise reduction reveals that the R 2 values for the C1 and D2 measurement points increased by 0.081 and 0.070, respectively. (3) The prediction intervals constructed by the GF-DeepAR model can effectively envelop the actual slope deformation curves, and the PICP in both C1 and D1 is 100%. (4) Whether it is point prediction or probability prediction, the GF-DeepAR model excels at extracting feature information from slope deformation sequences characterized by randomness and complexity. It conducts predictions with high accuracy and reliability, indicating superior performance compared to other models. The results of the study can provide a reference for the theory of slope deformation prediction, and can also provide a reference for similar projects.

Keywords: Slope deformation prediction, deep learning, DeepAR model, Gaussian-filter algorithm, point prediction, Probability analysis

Received: 12 Mar 2024; Accepted: 26 Apr 2024.

Copyright: © 2024 Shao, Liu and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: PhD. Fuming Liu, Hunan Water Planning and Design Institute Co., Ltd, Changsha, Anhui Province, China

People also looked at

IMAGES

  1. Steps followed in Exploratory Factor Analysis.

    factor analysis in research methodology

  2. Factor Analysis

    factor analysis in research methodology

  3. Factor Analysis

    factor analysis in research methodology

  4. Factor Analysis

    factor analysis in research methodology

  5. What is factor analysis?

    factor analysis in research methodology

  6. Factor Analysis

    factor analysis in research methodology

VIDEO

  1. Factor analysis

  2. Factor Analysis (Part-2) by Dr. Sanjeev Bakshi, IGNTU, Amarkantak

  3. Factor Analysis (Part-1) by Dr. Sanjeev Bakshi, IGNTU, Amarkantak

  4. Factor Analysis Part 1

  5. Factor Analysis --Part-II-Final

  6. Video 8 Factor Extraction PQMethod

COMMENTS

  1. Factor Analysis

    Factor Analysis Steps. Here are the general steps involved in conducting a factor analysis: 1. Define the Research Objective: Clearly specify the purpose of the factor analysis. Determine what you aim to achieve or understand through the analysis. 2. Data Collection: Gather the data on the variables of interest.

  2. Exploratory Factor Analysis: A Guide to Best Practice

    Abstract. Exploratory factor analysis (EFA) is a multivariate statistical method that has become a fundamental tool in the development and validation of psychological theories and measurements. However, researchers must make several thoughtful and evidence-based methodological decisions while conducting an EFA, and there are a number of options ...

  3. Factor Analysis Guide with an Example

    The first methodology choice for factor analysis is the mathematical approach for extracting the factors from your dataset. The most common choices are maximum likelihood (ML), principal axis factoring (PAF), and principal components analysis (PCA). You should use either ML or PAF most of the time.

  4. Factor Analysis: a means for theory and instrument development in

    Factor analysis methods can be incredibly useful tools for researchers attempting to establish high quality measures of those constructs not directly observed and captured by observation. Specifically, the factor solution derived from an Exploratory Factor Analysis provides a snapshot of the statistical relationships of the key behaviors ...

  5. Factor Analysis and How It Simplifies Research Findings

    Factor analysis isn't a single technique, but a family of statistical methods that can be used to identify the latent factors driving observable variables. Factor analysis is commonly used in market research, as well as other disciplines like technology, medicine, sociology, field biology, education, psychology and many more.

  6. PDF Factor Analysis

    Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst | The Harvard Clinical & Translational Science Center Short course, October 27, 2016 1 . ... Least-squares method (e.g. principal axis factoring with iterated communalities) ! Maximum likelihood method 17 .

  7. A Practical Introduction to Factor Analysis: Exploratory Factor Analysis

    Purpose. This seminar is the first part of a two-part seminar that introduces central concepts in factor analysis. Part 1 focuses on exploratory factor analysis (EFA). Although the implementation is in SPSS, the ideas carry over to any software program. Part 2 introduces confirmatory factor analysis (CFA).

  8. Lesson 12: Factor Analysis

    Overview. Factor Analysis is a method for modeling observed variables, and their covariance structure, in terms of a smaller number of underlying unobservable (latent) "factors.". The factors typically are viewed as broad concepts or ideas that may describe an observed phenomenon. For example, a basic desire of obtaining a certain social ...

  9. Factor analysis

    Higher-order factor analysis is a statistical method consisting of repeating steps factor analysis ... In cross-cultural research. Factor analysis is a frequently used technique in cross-cultural research. It serves the purpose of extracting cultural dimensions.

  10. A Practical Introduction to Factor Analysis

    Factor analysis is a method for modeling observed variables and their covariance structure in terms of unobserved variables (i.e., factors). There are two types of factor analyses, exploratory and confirmatory. Exploratory factor analysis (EFA) is method to explore the underlying structure of a set of observed variables, and is a crucial step ...

  11. Understanding and Using Factor Scores: Considerations for the ...

    or confirmatory factor analysis procedures, and 63 articles (27.5%) did not provide sufficient information on the methodology used. For example, many factor score methods are built on the assumption that the resulting factor scores will be uncorrelated; however, orthogonal factors are often the rarity rather than the norm in educational research.

  12. Exploratory Factor Analysis: Basics and Beyond

    Exploratory factor analysis (EFA) is a statistical method used to answer a wide range of research questions pertaining to the underlying structure of a set of variables. A primary goal of this chapter is to provide sufficient background information to foster a comprehensive understanding for the series of methodological decisions that have to ...

  13. An Introduction to Factor Analysis: Reducing Variables

    Factor analysis is a sophisticated statistical method aimed at reducing a large number of variables into a smaller set of factors. This technique is valuable for extracting the maximum common variance from all variables, transforming them into a single score for further analysis. As a part of the general linear model (GLM), factor analysis is ...

  14. Factor Analysis 101: The Basics

    Factor analysis is a powerful data reduction technique that enables researchers to investigate concepts that cannot easily be measured directly. By boiling down a large number of variables into a handful of comprehensible underlying factors, factor analysis results in easy-to-understand, actionable data.

  15. Sage Research Methods Foundations

    Methods Map. This visualization demonstrates how methods are related and connects users to relevant content. Project Planner. Find step-by-step guidance to complete your research project. Which Stats Test. Answer a handful of multiple-choice questions to see which statistical method is best for your data. Reading Lists

  16. Factor Analysis Overview

    Research Methods for Lehman EdD. 9 Factor Analysis Overview You can load this file with open_template_file("factoranalysis"). ... This is because factor analysis is a multivariate method, controlling for many variables at once. When you do that the bivariate relationships can change.

  17. Factor Analysis

    Factor analysis is a multivariate method that can be used for analyzing large data sets with two main goals: 1. to reduce a large number of correlating variables to a fewer number of factors,. 2. to structure the data with the aim of identifying dependencies between correlating variables and examining them for common causes (factors) in order to generate a new construct (factor) on this basis.

  18. Factor Analysis: Easy Definition

    Procrustes analysis is a way to compare two sets of configurations, or shapes. Originally developed to match two solutions from Factor Analysis, the technique was extended to Generalized Procrustes Analysis so that more than two shapes could be compared. The shapes are aligned to a target shape or to each other.

  19. Sage Research Methods: Business

    This guide further explains various parts and parcels of factor analysis: (1) the process of factor loading on a specific survey case, (2) the identification process for an appropriate number of factors and optimal combination of factors, depending on the specific research design and goals, and (3) an explanation of dimensions, their reduction ...

  20. (PDF) Overview of Factor Analysis

    Chapter 1. Theoretical In tro duction. • Factor analysis is a collection of methods used to examine how underlying constructs influence the. resp onses on a n umber of measured v ariables ...

  21. A Primer on Factor Analysis in Research using Reproducible R Software

    Factor analysis is a statistical method that is widely used in research to identify the underlying factors that explain the variations in a set of observed variables. The method is particularly useful in fields such as psychology, sociology, marketing, and education, where researchers often deal with complex datasets that contain many variables.

  22. Factor Analysis in Psychology: Types, How It's Used

    The primary goal of factor analysis is to distill a large data set into a working set of connections or factors. Dr. Jessie Borelli, PhD, who works at the University of California-Irvine, uses factor analysis in her work on attachment. She is doing research that looks into how people perceive relationships and how they connect to one another.

  23. Factor Analysis as a Tool for Survey Analysis

    Abstract and Figures. Factor analysis is particularly suitable to extract few factors from the large number of related variables to a more manageable number, prior to using them in other analysis ...

  24. Research on factor analysis and method for evaluating grouting ...

    Research on factor analysis and method for evaluating grouting effects using machine learning Sci Rep. 2024 Apr 2;14(1) :7782. doi ... Analysis revealed that the correlation among the eight selected indicators-including the proportion of boreholes in the target rock strata, drilling length, leakage, water level, pressure of grouting, mass of ...

  25. Addressing medical student burnout through informal peer-assisted

    Background Despite the recognized advantages of Peer-Assisted Learning (PAL) in academic settings, there is a notable absence of research analyzing its effects on students' Academic Burnout. This study aims to cover this gap by assessing the underlying effectiveness of Informal Peer-Assisted Learning (IPAL) as a cooperative learning method, focusing on its potential to mitigate academic ...

  26. Full article: Risk of Intracranial Hemorrhage in Persons with

    Introduction. Hemophilia A (HA) is an X-linked recessive bleeding disorder caused by a deficiency of factor VIII (FVIII). Citation 1, Citation 2 There are an estimated 24,000-26,400 males with hemophilia A in the United States; approximately 75% of persons with hemophilia A (PWHA) have moderate or severe forms of the disease. Citation 3, Citation 4 Intracranial hemorrhage (ICH) in PWHA is ...

  27. The overall stability of a partially unstable reservoir bank slope to

    In geotechnical analysis, the factor of safety (FOS) is crucial for slope stability assessment. Traditional methods often overlook the nuances of partial slope instability. Accurately determining geotechnical parameters for FOS in complex simulations is challenging and resource-intensive. The limit equilibrium method (LEM), considering unit weight, cohesion, and internal friction angle, is ...

  28. Genome-Wide Identification of Phytochrome-Interacting Factor (PIF) Gene

    The phytochrome-interacting factor (PIF) proteins are part of a subfamily of basic helix-loop-helix (bHLH) transcription factors that integrate with phytochromes (PHYs) and are known to play important roles in adaptive changes in plant architecture. However, the characterization and function of PIFs in potatoes are currently poorly understood. In this study, we identified seven PIF members ...

  29. Frontiers

    Slope deformation, a key factor affecting slope stability, has complexity and uncertainty. It is crucial for early warning of slope instability disasters to master the future development law of slope deformation. In this paper, a model for point prediction and probability analysis of slope deformation based on DeepAR deep learning algorithm is proposed.In addition, considering the noise ...

  30. Adaptability analysis and model development of various LS-factor

    The slope length and slope steepness factor (LS-factor) formula in the Revised Universal Soil Loss Equation (RUSLE) has a considerable level of uncertainty due to the existence of multiple methods. In this study, four commonly used formulas for the slope length factor and two formulas for the slope gradient factor were chosen and combined based on their applicability to the specific research ...