Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Content Analysis | Guide, Methods & Examples

Content Analysis | Guide, Methods & Examples

Published on July 18, 2019 by Amy Luo . Revised on June 22, 2023.

Content analysis is a research method used to identify patterns in recorded communication. To conduct content analysis, you systematically collect data from a set of texts, which can be written, oral, or visual:

  • Books, newspapers and magazines
  • Speeches and interviews
  • Web content and social media posts
  • Photographs and films

Content analysis can be both quantitative (focused on counting and measuring) and qualitative (focused on interpreting and understanding).  In both types, you categorize or “code” words, themes, and concepts within the texts and then analyze the results.

Table of contents

What is content analysis used for, advantages of content analysis, disadvantages of content analysis, how to conduct content analysis, other interesting articles.

Researchers use content analysis to find out about the purposes, messages, and effects of communication content. They can also make inferences about the producers and audience of the texts they analyze.

Content analysis can be used to quantify the occurrence of certain words, phrases, subjects or concepts in a set of historical or contemporary texts.

Quantitative content analysis example

To research the importance of employment issues in political campaigns, you could analyze campaign speeches for the frequency of terms such as unemployment , jobs , and work  and use statistical analysis to find differences over time or between candidates.

In addition, content analysis can be used to make qualitative inferences by analyzing the meaning and semantic relationship of words and concepts.

Qualitative content analysis example

To gain a more qualitative understanding of employment issues in political campaigns, you could locate the word unemployment in speeches, identify what other words or phrases appear next to it (such as economy,   inequality or  laziness ), and analyze the meanings of these relationships to better understand the intentions and targets of different campaigns.

Because content analysis can be applied to a broad range of texts, it is used in a variety of fields, including marketing, media studies, anthropology, cognitive science, psychology, and many social science disciplines. It has various possible goals:

  • Finding correlations and patterns in how concepts are communicated
  • Understanding the intentions of an individual, group or institution
  • Identifying propaganda and bias in communication
  • Revealing differences in communication in different contexts
  • Analyzing the consequences of communication content, such as the flow of information or audience responses

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

  • Unobtrusive data collection

You can analyze communication and social interaction without the direct involvement of participants, so your presence as a researcher doesn’t influence the results.

  • Transparent and replicable

When done well, content analysis follows a systematic procedure that can easily be replicated by other researchers, yielding results with high reliability .

  • Highly flexible

You can conduct content analysis at any time, in any location, and at low cost – all you need is access to the appropriate sources.

Focusing on words or phrases in isolation can sometimes be overly reductive, disregarding context, nuance, and ambiguous meanings.

Content analysis almost always involves some level of subjective interpretation, which can affect the reliability and validity of the results and conclusions, leading to various types of research bias and cognitive bias .

  • Time intensive

Manually coding large volumes of text is extremely time-consuming, and it can be difficult to automate effectively.

If you want to use content analysis in your research, you need to start with a clear, direct  research question .

Example research question for content analysis

Is there a difference in how the US media represents younger politicians compared to older ones in terms of trustworthiness?

Next, you follow these five steps.

1. Select the content you will analyze

Based on your research question, choose the texts that you will analyze. You need to decide:

  • The medium (e.g. newspapers, speeches or websites) and genre (e.g. opinion pieces, political campaign speeches, or marketing copy)
  • The inclusion and exclusion criteria (e.g. newspaper articles that mention a particular event, speeches by a certain politician, or websites selling a specific type of product)
  • The parameters in terms of date range, location, etc.

If there are only a small amount of texts that meet your criteria, you might analyze all of them. If there is a large volume of texts, you can select a sample .

2. Define the units and categories of analysis

Next, you need to determine the level at which you will analyze your chosen texts. This means defining:

  • The unit(s) of meaning that will be coded. For example, are you going to record the frequency of individual words and phrases, the characteristics of people who produced or appear in the texts, the presence and positioning of images, or the treatment of themes and concepts?
  • The set of categories that you will use for coding. Categories can be objective characteristics (e.g. aged 30-40 ,  lawyer , parent ) or more conceptual (e.g. trustworthy , corrupt , conservative , family oriented ).

Your units of analysis are the politicians who appear in each article and the words and phrases that are used to describe them. Based on your research question, you have to categorize based on age and the concept of trustworthiness. To get more detailed data, you also code for other categories such as their political party and the marital status of each politician mentioned.

3. Develop a set of rules for coding

Coding involves organizing the units of meaning into the previously defined categories. Especially with more conceptual categories, it’s important to clearly define the rules for what will and won’t be included to ensure that all texts are coded consistently.

Coding rules are especially important if multiple researchers are involved, but even if you’re coding all of the text by yourself, recording the rules makes your method more transparent and reliable.

In considering the category “younger politician,” you decide which titles will be coded with this category ( senator, governor, counselor, mayor ). With “trustworthy”, you decide which specific words or phrases related to trustworthiness (e.g. honest and reliable ) will be coded in this category.

4. Code the text according to the rules

You go through each text and record all relevant data in the appropriate categories. This can be done manually or aided with computer programs, such as QSR NVivo , Atlas.ti and Diction , which can help speed up the process of counting and categorizing words and phrases.

Following your coding rules, you examine each newspaper article in your sample. You record the characteristics of each politician mentioned, along with all words and phrases related to trustworthiness that are used to describe them.

5. Analyze the results and draw conclusions

Once coding is complete, the collected data is examined to find patterns and draw conclusions in response to your research question. You might use statistical analysis to find correlations or trends, discuss your interpretations of what the results mean, and make inferences about the creators, context and audience of the texts.

Let’s say the results reveal that words and phrases related to trustworthiness appeared in the same sentence as an older politician more frequently than they did in the same sentence as a younger politician. From these results, you conclude that national newspapers present older politicians as more trustworthy than younger politicians, and infer that this might have an effect on readers’ perceptions of younger people in politics.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Measures of central tendency
  • Chi square tests
  • Confidence interval
  • Quartiles & Quantiles
  • Cluster sampling
  • Stratified sampling
  • Thematic analysis
  • Cohort study
  • Peer review
  • Ethnography

Research bias

  • Implicit bias
  • Cognitive bias
  • Conformity bias
  • Hawthorne effect
  • Availability heuristic
  • Attrition bias
  • Social desirability bias

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Luo, A. (2023, June 22). Content Analysis | Guide, Methods & Examples. Scribbr. Retrieved April 15, 2024, from https://www.scribbr.com/methodology/content-analysis/

Is this article helpful?

Amy Luo

Other students also liked

Qualitative vs. quantitative research | differences, examples & methods, descriptive research | definition, types, methods & examples, reliability vs. validity in research | difference, types and examples, unlimited academic ai-proofreading.

✔ Document error-free in 5minutes ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

  • Privacy Policy

Buy Me a Coffee

Research Method

Home » Content Analysis – Methods, Types and Examples

Content Analysis – Methods, Types and Examples

Table of Contents

Content Analysis

Content Analysis

Definition:

Content analysis is a research method used to analyze and interpret the characteristics of various forms of communication, such as text, images, or audio. It involves systematically analyzing the content of these materials, identifying patterns, themes, and other relevant features, and drawing inferences or conclusions based on the findings.

Content analysis can be used to study a wide range of topics, including media coverage of social issues, political speeches, advertising messages, and online discussions, among others. It is often used in qualitative research and can be combined with other methods to provide a more comprehensive understanding of a particular phenomenon.

Types of Content Analysis

There are generally two types of content analysis:

Quantitative Content Analysis

This type of content analysis involves the systematic and objective counting and categorization of the content of a particular form of communication, such as text or video. The data obtained is then subjected to statistical analysis to identify patterns, trends, and relationships between different variables. Quantitative content analysis is often used to study media content, advertising, and political speeches.

Qualitative Content Analysis

This type of content analysis is concerned with the interpretation and understanding of the meaning and context of the content. It involves the systematic analysis of the content to identify themes, patterns, and other relevant features, and to interpret the underlying meanings and implications of these features. Qualitative content analysis is often used to study interviews, focus groups, and other forms of qualitative data, where the researcher is interested in understanding the subjective experiences and perceptions of the participants.

Methods of Content Analysis

There are several methods of content analysis, including:

Conceptual Analysis

This method involves analyzing the meanings of key concepts used in the content being analyzed. The researcher identifies key concepts and analyzes how they are used, defining them and categorizing them into broader themes.

Content Analysis by Frequency

This method involves counting and categorizing the frequency of specific words, phrases, or themes that appear in the content being analyzed. The researcher identifies relevant keywords or phrases and systematically counts their frequency.

Comparative Analysis

This method involves comparing the content of two or more sources to identify similarities, differences, and patterns. The researcher selects relevant sources, identifies key themes or concepts, and compares how they are represented in each source.

Discourse Analysis

This method involves analyzing the structure and language of the content being analyzed to identify how the content constructs and represents social reality. The researcher analyzes the language used and the underlying assumptions, beliefs, and values reflected in the content.

Narrative Analysis

This method involves analyzing the content as a narrative, identifying the plot, characters, and themes, and analyzing how they relate to the broader social context. The researcher identifies the underlying messages conveyed by the narrative and their implications for the broader social context.

Content Analysis Conducting Guide

Here is a basic guide to conducting a content analysis:

  • Define your research question or objective: Before starting your content analysis, you need to define your research question or objective clearly. This will help you to identify the content you need to analyze and the type of analysis you need to conduct.
  • Select your sample: Select a representative sample of the content you want to analyze. This may involve selecting a random sample, a purposive sample, or a convenience sample, depending on the research question and the availability of the content.
  • Develop a coding scheme: Develop a coding scheme or a set of categories to use for coding the content. The coding scheme should be based on your research question or objective and should be reliable, valid, and comprehensive.
  • Train coders: Train coders to use the coding scheme and ensure that they have a clear understanding of the coding categories and procedures. You may also need to establish inter-coder reliability to ensure that different coders are coding the content consistently.
  • Code the content: Code the content using the coding scheme. This may involve manually coding the content, using software, or a combination of both.
  • Analyze the data: Once the content is coded, analyze the data using appropriate statistical or qualitative methods, depending on the research question and the type of data.
  • Interpret the results: Interpret the results of the analysis in the context of your research question or objective. Draw conclusions based on the findings and relate them to the broader literature on the topic.
  • Report your findings: Report your findings in a clear and concise manner, including the research question, methodology, results, and conclusions. Provide details about the coding scheme, inter-coder reliability, and any limitations of the study.

Applications of Content Analysis

Content analysis has numerous applications across different fields, including:

  • Media Research: Content analysis is commonly used in media research to examine the representation of different groups, such as race, gender, and sexual orientation, in media content. It can also be used to study media framing, media bias, and media effects.
  • Political Communication : Content analysis can be used to study political communication, including political speeches, debates, and news coverage of political events. It can also be used to study political advertising and the impact of political communication on public opinion and voting behavior.
  • Marketing Research: Content analysis can be used to study advertising messages, consumer reviews, and social media posts related to products or services. It can provide insights into consumer preferences, attitudes, and behaviors.
  • Health Communication: Content analysis can be used to study health communication, including the representation of health issues in the media, the effectiveness of health campaigns, and the impact of health messages on behavior.
  • Education Research : Content analysis can be used to study educational materials, including textbooks, curricula, and instructional materials. It can provide insights into the representation of different topics, perspectives, and values.
  • Social Science Research: Content analysis can be used in a wide range of social science research, including studies of social media, online communities, and other forms of digital communication. It can also be used to study interviews, focus groups, and other qualitative data sources.

Examples of Content Analysis

Here are some examples of content analysis:

  • Media Representation of Race and Gender: A content analysis could be conducted to examine the representation of different races and genders in popular media, such as movies, TV shows, and news coverage.
  • Political Campaign Ads : A content analysis could be conducted to study political campaign ads and the themes and messages used by candidates.
  • Social Media Posts: A content analysis could be conducted to study social media posts related to a particular topic, such as the COVID-19 pandemic, to examine the attitudes and beliefs of social media users.
  • Instructional Materials: A content analysis could be conducted to study the representation of different topics and perspectives in educational materials, such as textbooks and curricula.
  • Product Reviews: A content analysis could be conducted to study product reviews on e-commerce websites, such as Amazon, to identify common themes and issues mentioned by consumers.
  • News Coverage of Health Issues: A content analysis could be conducted to study news coverage of health issues, such as vaccine hesitancy, to identify common themes and perspectives.
  • Online Communities: A content analysis could be conducted to study online communities, such as discussion forums or social media groups, to understand the language, attitudes, and beliefs of the community members.

Purpose of Content Analysis

The purpose of content analysis is to systematically analyze and interpret the content of various forms of communication, such as written, oral, or visual, to identify patterns, themes, and meanings. Content analysis is used to study communication in a wide range of fields, including media studies, political science, psychology, education, sociology, and marketing research. The primary goals of content analysis include:

  • Describing and summarizing communication: Content analysis can be used to describe and summarize the content of communication, such as the themes, topics, and messages conveyed in media content, political speeches, or social media posts.
  • Identifying patterns and trends: Content analysis can be used to identify patterns and trends in communication, such as changes over time, differences between groups, or common themes or motifs.
  • Exploring meanings and interpretations: Content analysis can be used to explore the meanings and interpretations of communication, such as the underlying values, beliefs, and assumptions that shape the content.
  • Testing hypotheses and theories : Content analysis can be used to test hypotheses and theories about communication, such as the effects of media on attitudes and behaviors or the framing of political issues in the media.

When to use Content Analysis

Content analysis is a useful method when you want to analyze and interpret the content of various forms of communication, such as written, oral, or visual. Here are some specific situations where content analysis might be appropriate:

  • When you want to study media content: Content analysis is commonly used in media studies to analyze the content of TV shows, movies, news coverage, and other forms of media.
  • When you want to study political communication : Content analysis can be used to study political speeches, debates, news coverage, and advertising.
  • When you want to study consumer attitudes and behaviors: Content analysis can be used to analyze product reviews, social media posts, and other forms of consumer feedback.
  • When you want to study educational materials : Content analysis can be used to analyze textbooks, instructional materials, and curricula.
  • When you want to study online communities: Content analysis can be used to analyze discussion forums, social media groups, and other forms of online communication.
  • When you want to test hypotheses and theories : Content analysis can be used to test hypotheses and theories about communication, such as the framing of political issues in the media or the effects of media on attitudes and behaviors.

Characteristics of Content Analysis

Content analysis has several key characteristics that make it a useful research method. These include:

  • Objectivity : Content analysis aims to be an objective method of research, meaning that the researcher does not introduce their own biases or interpretations into the analysis. This is achieved by using standardized and systematic coding procedures.
  • Systematic: Content analysis involves the use of a systematic approach to analyze and interpret the content of communication. This involves defining the research question, selecting the sample of content to analyze, developing a coding scheme, and analyzing the data.
  • Quantitative : Content analysis often involves counting and measuring the occurrence of specific themes or topics in the content, making it a quantitative research method. This allows for statistical analysis and generalization of findings.
  • Contextual : Content analysis considers the context in which the communication takes place, such as the time period, the audience, and the purpose of the communication.
  • Iterative : Content analysis is an iterative process, meaning that the researcher may refine the coding scheme and analysis as they analyze the data, to ensure that the findings are valid and reliable.
  • Reliability and validity : Content analysis aims to be a reliable and valid method of research, meaning that the findings are consistent and accurate. This is achieved through inter-coder reliability tests and other measures to ensure the quality of the data and analysis.

Advantages of Content Analysis

There are several advantages to using content analysis as a research method, including:

  • Objective and systematic : Content analysis aims to be an objective and systematic method of research, which reduces the likelihood of bias and subjectivity in the analysis.
  • Large sample size: Content analysis allows for the analysis of a large sample of data, which increases the statistical power of the analysis and the generalizability of the findings.
  • Non-intrusive: Content analysis does not require the researcher to interact with the participants or disrupt their natural behavior, making it a non-intrusive research method.
  • Accessible data: Content analysis can be used to analyze a wide range of data types, including written, oral, and visual communication, making it accessible to researchers across different fields.
  • Versatile : Content analysis can be used to study communication in a wide range of contexts and fields, including media studies, political science, psychology, education, sociology, and marketing research.
  • Cost-effective: Content analysis is a cost-effective research method, as it does not require expensive equipment or participant incentives.

Limitations of Content Analysis

While content analysis has many advantages, there are also some limitations to consider, including:

  • Limited contextual information: Content analysis is focused on the content of communication, which means that contextual information may be limited. This can make it difficult to fully understand the meaning behind the communication.
  • Limited ability to capture nonverbal communication : Content analysis is limited to analyzing the content of communication that can be captured in written or recorded form. It may miss out on nonverbal communication, such as body language or tone of voice.
  • Subjectivity in coding: While content analysis aims to be objective, there may be subjectivity in the coding process. Different coders may interpret the content differently, which can lead to inconsistent results.
  • Limited ability to establish causality: Content analysis is a correlational research method, meaning that it cannot establish causality between variables. It can only identify associations between variables.
  • Limited generalizability: Content analysis is limited to the data that is analyzed, which means that the findings may not be generalizable to other contexts or populations.
  • Time-consuming: Content analysis can be a time-consuming research method, especially when analyzing a large sample of data. This can be a disadvantage for researchers who need to complete their research in a short amount of time.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Cluster Analysis

Cluster Analysis – Types, Methods and Examples

Discriminant Analysis

Discriminant Analysis – Methods, Types and...

MANOVA

MANOVA (Multivariate Analysis of Variance) –...

Documentary Analysis

Documentary Analysis – Methods, Applications and...

ANOVA

ANOVA (Analysis of variance) – Formulas, Types...

Graphical Methods

Graphical Methods – Types, Examples and Guide

Skip to content

Read the latest news stories about Mailman faculty, research, and events. 

Departments

We integrate an innovative skills-based curriculum, research collaborations, and hands-on field experience to prepare students.

Learn more about our research centers, which focus on critical issues in public health.

Our Faculty

Meet the faculty of the Mailman School of Public Health. 

Become a Student

Life and community, how to apply.

Learn how to apply to the Mailman School of Public Health. 

Content Analysis

Content analysis is a research tool used to determine the presence of certain words, themes, or concepts within some given qualitative data (i.e. text). Using content analysis, researchers can quantify and analyze the presence, meanings, and relationships of such certain words, themes, or concepts. As an example, researchers can evaluate language used within a news article to search for bias or partiality. Researchers can then make inferences about the messages within the texts, the writer(s), the audience, and even the culture and time of surrounding the text.

Description

Sources of data could be from interviews, open-ended questions, field research notes, conversations, or literally any occurrence of communicative language (such as books, essays, discussions, newspaper headlines, speeches, media, historical documents). A single study may analyze various forms of text in its analysis. To analyze the text using content analysis, the text must be coded, or broken down, into manageable code categories for analysis (i.e. “codes”). Once the text is coded into code categories, the codes can then be further categorized into “code categories” to summarize data even further.

Three different definitions of content analysis are provided below.

Definition 1: “Any technique for making inferences by systematically and objectively identifying special characteristics of messages.” (from Holsti, 1968)

Definition 2: “An interpretive and naturalistic approach. It is both observational and narrative in nature and relies less on the experimental elements normally associated with scientific research (reliability, validity, and generalizability) (from Ethnography, Observational Research, and Narrative Inquiry, 1994-2012).

Definition 3: “A research technique for the objective, systematic and quantitative description of the manifest content of communication.” (from Berelson, 1952)

Uses of Content Analysis

Identify the intentions, focus or communication trends of an individual, group or institution

Describe attitudinal and behavioral responses to communications

Determine the psychological or emotional state of persons or groups

Reveal international differences in communication content

Reveal patterns in communication content

Pre-test and improve an intervention or survey prior to launch

Analyze focus group interviews and open-ended questions to complement quantitative data

Types of Content Analysis

There are two general types of content analysis: conceptual analysis and relational analysis. Conceptual analysis determines the existence and frequency of concepts in a text. Relational analysis develops the conceptual analysis further by examining the relationships among concepts in a text. Each type of analysis may lead to different results, conclusions, interpretations and meanings.

Conceptual Analysis

Typically people think of conceptual analysis when they think of content analysis. In conceptual analysis, a concept is chosen for examination and the analysis involves quantifying and counting its presence. The main goal is to examine the occurrence of selected terms in the data. Terms may be explicit or implicit. Explicit terms are easy to identify. Coding of implicit terms is more complicated: you need to decide the level of implication and base judgments on subjectivity (an issue for reliability and validity). Therefore, coding of implicit terms involves using a dictionary or contextual translation rules or both.

To begin a conceptual content analysis, first identify the research question and choose a sample or samples for analysis. Next, the text must be coded into manageable content categories. This is basically a process of selective reduction. By reducing the text to categories, the researcher can focus on and code for specific words or patterns that inform the research question.

General steps for conducting a conceptual content analysis:

1. Decide the level of analysis: word, word sense, phrase, sentence, themes

2. Decide how many concepts to code for: develop a pre-defined or interactive set of categories or concepts. Decide either: A. to allow flexibility to add categories through the coding process, or B. to stick with the pre-defined set of categories.

Option A allows for the introduction and analysis of new and important material that could have significant implications to one’s research question.

Option B allows the researcher to stay focused and examine the data for specific concepts.

3. Decide whether to code for existence or frequency of a concept. The decision changes the coding process.

When coding for the existence of a concept, the researcher would count a concept only once if it appeared at least once in the data and no matter how many times it appeared.

When coding for the frequency of a concept, the researcher would count the number of times a concept appears in a text.

4. Decide on how you will distinguish among concepts:

Should text be coded exactly as they appear or coded as the same when they appear in different forms? For example, “dangerous” vs. “dangerousness”. The point here is to create coding rules so that these word segments are transparently categorized in a logical fashion. The rules could make all of these word segments fall into the same category, or perhaps the rules can be formulated so that the researcher can distinguish these word segments into separate codes.

What level of implication is to be allowed? Words that imply the concept or words that explicitly state the concept? For example, “dangerous” vs. “the person is scary” vs. “that person could cause harm to me”. These word segments may not merit separate categories, due the implicit meaning of “dangerous”.

5. Develop rules for coding your texts. After decisions of steps 1-4 are complete, a researcher can begin developing rules for translation of text into codes. This will keep the coding process organized and consistent. The researcher can code for exactly what he/she wants to code. Validity of the coding process is ensured when the researcher is consistent and coherent in their codes, meaning that they follow their translation rules. In content analysis, obeying by the translation rules is equivalent to validity.

6. Decide what to do with irrelevant information: should this be ignored (e.g. common English words like “the” and “and”), or used to reexamine the coding scheme in the case that it would add to the outcome of coding?

7. Code the text: This can be done by hand or by using software. By using software, researchers can input categories and have coding done automatically, quickly and efficiently, by the software program. When coding is done by hand, a researcher can recognize errors far more easily (e.g. typos, misspelling). If using computer coding, text could be cleaned of errors to include all available data. This decision of hand vs. computer coding is most relevant for implicit information where category preparation is essential for accurate coding.

8. Analyze your results: Draw conclusions and generalizations where possible. Determine what to do with irrelevant, unwanted, or unused text: reexamine, ignore, or reassess the coding scheme. Interpret results carefully as conceptual content analysis can only quantify the information. Typically, general trends and patterns can be identified.

Relational Analysis

Relational analysis begins like conceptual analysis, where a concept is chosen for examination. However, the analysis involves exploring the relationships between concepts. Individual concepts are viewed as having no inherent meaning and rather the meaning is a product of the relationships among concepts.

To begin a relational content analysis, first identify a research question and choose a sample or samples for analysis. The research question must be focused so the concept types are not open to interpretation and can be summarized. Next, select text for analysis. Select text for analysis carefully by balancing having enough information for a thorough analysis so results are not limited with having information that is too extensive so that the coding process becomes too arduous and heavy to supply meaningful and worthwhile results.

There are three subcategories of relational analysis to choose from prior to going on to the general steps.

Affect extraction: an emotional evaluation of concepts explicit in a text. A challenge to this method is that emotions can vary across time, populations, and space. However, it could be effective at capturing the emotional and psychological state of the speaker or writer of the text.

Proximity analysis: an evaluation of the co-occurrence of explicit concepts in the text. Text is defined as a string of words called a “window” that is scanned for the co-occurrence of concepts. The result is the creation of a “concept matrix”, or a group of interrelated co-occurring concepts that would suggest an overall meaning.

Cognitive mapping: a visualization technique for either affect extraction or proximity analysis. Cognitive mapping attempts to create a model of the overall meaning of the text such as a graphic map that represents the relationships between concepts.

General steps for conducting a relational content analysis:

1. Determine the type of analysis: Once the sample has been selected, the researcher needs to determine what types of relationships to examine and the level of analysis: word, word sense, phrase, sentence, themes. 2. Reduce the text to categories and code for words or patterns. A researcher can code for existence of meanings or words. 3. Explore the relationship between concepts: once the words are coded, the text can be analyzed for the following:

Strength of relationship: degree to which two or more concepts are related.

Sign of relationship: are concepts positively or negatively related to each other?

Direction of relationship: the types of relationship that categories exhibit. For example, “X implies Y” or “X occurs before Y” or “if X then Y” or if X is the primary motivator of Y.

4. Code the relationships: a difference between conceptual and relational analysis is that the statements or relationships between concepts are coded. 5. Perform statistical analyses: explore differences or look for relationships among the identified variables during coding. 6. Map out representations: such as decision mapping and mental models.

Reliability and Validity

Reliability : Because of the human nature of researchers, coding errors can never be eliminated but only minimized. Generally, 80% is an acceptable margin for reliability. Three criteria comprise the reliability of a content analysis:

Stability: the tendency for coders to consistently re-code the same data in the same way over a period of time.

Reproducibility: tendency for a group of coders to classify categories membership in the same way.

Accuracy: extent to which the classification of text corresponds to a standard or norm statistically.

Validity : Three criteria comprise the validity of a content analysis:

Closeness of categories: this can be achieved by utilizing multiple classifiers to arrive at an agreed upon definition of each specific category. Using multiple classifiers, a concept category that may be an explicit variable can be broadened to include synonyms or implicit variables.

Conclusions: What level of implication is allowable? Do conclusions correctly follow the data? Are results explainable by other phenomena? This becomes especially problematic when using computer software for analysis and distinguishing between synonyms. For example, the word “mine,” variously denotes a personal pronoun, an explosive device, and a deep hole in the ground from which ore is extracted. Software can obtain an accurate count of that word’s occurrence and frequency, but not be able to produce an accurate accounting of the meaning inherent in each particular usage. This problem could throw off one’s results and make any conclusion invalid.

Generalizability of the results to a theory: dependent on the clear definitions of concept categories, how they are determined and how reliable they are at measuring the idea one is seeking to measure. Generalizability parallels reliability as much of it depends on the three criteria for reliability.

Advantages of Content Analysis

Directly examines communication using text

Allows for both qualitative and quantitative analysis

Provides valuable historical and cultural insights over time

Allows a closeness to data

Coded form of the text can be statistically analyzed

Unobtrusive means of analyzing interactions

Provides insight into complex models of human thought and language use

When done well, is considered a relatively “exact” research method

Content analysis is a readily-understood and an inexpensive research method

A more powerful tool when combined with other research methods such as interviews, observation, and use of archival records. It is very useful for analyzing historical material, especially for documenting trends over time.

Disadvantages of Content Analysis

Can be extremely time consuming

Is subject to increased error, particularly when relational analysis is used to attain a higher level of interpretation

Is often devoid of theoretical base, or attempts too liberally to draw meaningful inferences about the relationships and impacts implied in a study

Is inherently reductive, particularly when dealing with complex texts

Tends too often to simply consist of word counts

Often disregards the context that produced the text, as well as the state of things after the text is produced

Can be difficult to automate or computerize

Textbooks & Chapters  

Berelson, Bernard. Content Analysis in Communication Research.New York: Free Press, 1952.

Busha, Charles H. and Stephen P. Harter. Research Methods in Librarianship: Techniques and Interpretation.New York: Academic Press, 1980.

de Sola Pool, Ithiel. Trends in Content Analysis. Urbana: University of Illinois Press, 1959.

Krippendorff, Klaus. Content Analysis: An Introduction to its Methodology. Beverly Hills: Sage Publications, 1980.

Fielding, NG & Lee, RM. Using Computers in Qualitative Research. SAGE Publications, 1991. (Refer to Chapter by Seidel, J. ‘Method and Madness in the Application of Computer Technology to Qualitative Data Analysis’.)

Methodological Articles  

Hsieh HF & Shannon SE. (2005). Three Approaches to Qualitative Content Analysis.Qualitative Health Research. 15(9): 1277-1288.

Elo S, Kaarianinen M, Kanste O, Polkki R, Utriainen K, & Kyngas H. (2014). Qualitative Content Analysis: A focus on trustworthiness. Sage Open. 4:1-10.

Application Articles  

Abroms LC, Padmanabhan N, Thaweethai L, & Phillips T. (2011). iPhone Apps for Smoking Cessation: A content analysis. American Journal of Preventive Medicine. 40(3):279-285.

Ullstrom S. Sachs MA, Hansson J, Ovretveit J, & Brommels M. (2014). Suffering in Silence: a qualitative study of second victims of adverse events. British Medical Journal, Quality & Safety Issue. 23:325-331.

Owen P. (2012).Portrayals of Schizophrenia by Entertainment Media: A Content Analysis of Contemporary Movies. Psychiatric Services. 63:655-659.

Choosing whether to conduct a content analysis by hand or by using computer software can be difficult. Refer to ‘Method and Madness in the Application of Computer Technology to Qualitative Data Analysis’ listed above in “Textbooks and Chapters” for a discussion of the issue.

QSR NVivo:  http://www.qsrinternational.com/products.aspx

Atlas.ti:  http://www.atlasti.com/webinars.html

R- RQDA package:  http://rqda.r-forge.r-project.org/

Rolly Constable, Marla Cowell, Sarita Zornek Crawford, David Golden, Jake Hartvigsen, Kathryn Morgan, Anne Mudgett, Kris Parrish, Laura Thomas, Erika Yolanda Thompson, Rosie Turner, and Mike Palmquist. (1994-2012). Ethnography, Observational Research, and Narrative Inquiry. Writing@CSU. Colorado State University. Available at: https://writing.colostate.edu/guides/guide.cfm?guideid=63 .

As an introduction to Content Analysis by Michael Palmquist, this is the main resource on Content Analysis on the Web. It is comprehensive, yet succinct. It includes examples and an annotated bibliography. The information contained in the narrative above draws heavily from and summarizes Michael Palmquist’s excellent resource on Content Analysis but was streamlined for the purpose of doctoral students and junior researchers in epidemiology.

At Columbia University Mailman School of Public Health, more detailed training is available through the Department of Sociomedical Sciences- P8785 Qualitative Research Methods.

Join the Conversation

Have a question about methods? Join us on Facebook

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology

Content Analysis | A Step-by-Step Guide with Examples

Published on 5 May 2022 by Amy Luo . Revised on 5 December 2022.

Content analysis is a research method used to identify patterns in recorded communication. To conduct content analysis, you systematically collect data from a set of texts, which can be written, oral, or visual:

  • Books, newspapers, and magazines
  • Speeches and interviews
  • Web content and social media posts
  • Photographs and films

Content analysis can be both quantitative (focused on counting and measuring) and qualitative (focused on interpreting and understanding). In both types, you categorise or ‘code’ words, themes, and concepts within the texts and then analyse the results.

Table of contents

What is content analysis used for, advantages of content analysis, disadvantages of content analysis, how to conduct content analysis.

Researchers use content analysis to find out about the purposes, messages, and effects of communication content. They can also make inferences about the producers and audience of the texts they analyse.

Content analysis can be used to quantify the occurrence of certain words, phrases, subjects, or concepts in a set of historical or contemporary texts.

In addition, content analysis can be used to make qualitative inferences by analysing the meaning and semantic relationship of words and concepts.

Because content analysis can be applied to a broad range of texts, it is used in a variety of fields, including marketing, media studies, anthropology, cognitive science, psychology, and many social science disciplines. It has various possible goals:

  • Finding correlations and patterns in how concepts are communicated
  • Understanding the intentions of an individual, group, or institution
  • Identifying propaganda and bias in communication
  • Revealing differences in communication in different contexts
  • Analysing the consequences of communication content, such as the flow of information or audience responses

Prevent plagiarism, run a free check.

  • Unobtrusive data collection

You can analyse communication and social interaction without the direct involvement of participants, so your presence as a researcher doesn’t influence the results.

  • Transparent and replicable

When done well, content analysis follows a systematic procedure that can easily be replicated by other researchers, yielding results with high reliability .

  • Highly flexible

You can conduct content analysis at any time, in any location, and at low cost. All you need is access to the appropriate sources.

Focusing on words or phrases in isolation can sometimes be overly reductive, disregarding context, nuance, and ambiguous meanings.

Content analysis almost always involves some level of subjective interpretation, which can affect the reliability and validity of the results and conclusions.

  • Time intensive

Manually coding large volumes of text is extremely time-consuming, and it can be difficult to automate effectively.

If you want to use content analysis in your research, you need to start with a clear, direct  research question .

Next, you follow these five steps.

Step 1: Select the content you will analyse

Based on your research question, choose the texts that you will analyse. You need to decide:

  • The medium (e.g., newspapers, speeches, or websites) and genre (e.g., opinion pieces, political campaign speeches, or marketing copy)
  • The criteria for inclusion (e.g., newspaper articles that mention a particular event, speeches by a certain politician, or websites selling a specific type of product)
  • The parameters in terms of date range, location, etc.

If there are only a small number of texts that meet your criteria, you might analyse all of them. If there is a large volume of texts, you can select a sample .

Step 2: Define the units and categories of analysis

Next, you need to determine the level at which you will analyse your chosen texts. This means defining:

  • The unit(s) of meaning that will be coded. For example, are you going to record the frequency of individual words and phrases, the characteristics of people who produced or appear in the texts, the presence and positioning of images, or the treatment of themes and concepts?
  • The set of categories that you will use for coding. Categories can be objective characteristics (e.g., aged 30–40, lawyer, parent) or more conceptual (e.g., trustworthy, corrupt, conservative, family-oriented).

Step 3: Develop a set of rules for coding

Coding involves organising the units of meaning into the previously defined categories. Especially with more conceptual categories, it’s important to clearly define the rules for what will and won’t be included to ensure that all texts are coded consistently.

Coding rules are especially important if multiple researchers are involved, but even if you’re coding all of the text by yourself, recording the rules makes your method more transparent and reliable.

Step 4: Code the text according to the rules

You go through each text and record all relevant data in the appropriate categories. This can be done manually or aided with computer programs, such as QSR NVivo , Atlas.ti , and Diction , which can help speed up the process of counting and categorising words and phrases.

Step 5: Analyse the results and draw conclusions

Once coding is complete, the collected data is examined to find patterns and draw conclusions in response to your research question. You might use statistical analysis to find correlations or trends, discuss your interpretations of what the results mean, and make inferences about the creators, context, and audience of the texts.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Luo, A. (2022, December 05). Content Analysis | A Step-by-Step Guide with Examples. Scribbr. Retrieved 15 April 2024, from https://www.scribbr.co.uk/research-methods/content-analysis-explained/

Is this article helpful?

Amy Luo

Other students also liked

How to do thematic analysis | guide & examples, data collection methods | step-by-step guide & examples, qualitative vs quantitative research | examples & methods.

  • What is content analysis?

Last updated

20 March 2023

Reviewed by

Miroslav Damyanov

When you're conducting qualitative research, you'll find yourself analyzing various texts. Perhaps you'll be evaluating transcripts from audio interviews you've conducted. Or you may find yourself assessing the results of a survey filled with open-ended questions.

Streamline content analysis

Bring all your qualitative research into one place to code and analyze with Dovetail

Content analysis is a research method used to identify the presence of various concepts, words, and themes in different texts. Two types of content analysis exist: conceptual analysis and relational analysis . In the former, researchers determine whether and how frequently certain concepts appear in a text. In relational analysis, researchers explore how different concepts are related to one another in a text. 

Both types of content analysis require the researcher to code the text. Coding the text means breaking it down into different categories that allow it to be analyzed more easily.

  • What are some common uses of content analysis?

You can use content analysis to analyze many forms of text, including:

Interview and discussion transcripts

Newspaper articles and headline

Literary works

Historical documents

Government reports

Academic papers

Music lyrics

Researchers commonly use content analysis to draw insights and conclusions from literary works. Historians and biographers may apply this approach to letters, papers, and other historical documents to gain insight into the historical figures and periods they are writing about. Market researchers can also use it to evaluate brand performance and perception.

Some researchers have used content analysis to explore differences in decision-making and other cognitive processes. While researchers traditionally used this approach to explore human cognition, content analysis is also at the heart of machine learning approaches currently being used and developed by software and AI companies.

  • Conducting a conceptual analysis

Conceptual analysis is more commonly associated with content analysis than relational analysis. 

In conceptual analysis, you're looking for the appearance and frequency of different concepts. Why? This information can help further your qualitative or quantitative analysis of a text. It's an inexpensive and easily understood research method that can help you draw inferences and conclusions about your research subject. And while it is a relatively straightforward analytical tool, it does consist of a multi-step process that you must closely follow to ensure the reliability and validity of your study.

When you're ready to conduct a conceptual analysis, refer to your research question and the text. Ask yourself what information likely found in the text is relevant to your question. You'll need to know this to determine how you'll code the text. Then follow these steps:

1. Determine whether you're looking for explicit terms or implicit terms.

Explicit terms are those that directly appear in the text, while implicit ones are those that the text implies or alludes to or that you can infer. 

Coding for explicit terms is straightforward. For example, if you're looking to code a text for an author's explicit use of color,  you'd simply code for every instance a color appears in the text. However, if you're coding for implicit terms, you'll need to determine and define how you're identifying the presence of the term first. Doing so involves a certain amount of subjectivity and may impinge upon the reliability and validity of your study .

2. Next, identify the level at which you'll conduct your analysis.

You can search for words, phrases, or sentences encapsulating your terms. You can also search for concepts and themes, but you'll need to define how you expect to identify them in the text. You must also define rules for how you'll code different terms to reduce ambiguity. For example, if, in an interview transcript, a person repeats a word one or more times in a row as a verbal tic, should you code it more than once? And what will you do with irrelevant data that appears in a term if you're coding for sentences? 

Defining these rules upfront can help make your content analysis more efficient and your final analysis more reliable and valid.

3. You'll need to determine whether you're coding for a concept or theme's existence or frequency.

If you're coding for its existence, you’ll only count it once, at its first appearance, no matter how many times it subsequently appears. If you're searching for frequency, you'll count the number of its appearances in the text.

4. You'll also want to determine the number of terms you want to code for and how you may wish to categorize them.

For example, say you're conducting a content analysis of customer service call transcripts and looking for evidence of customer dissatisfaction with a product or service. You might create categories that refer to different elements with which customers might be dissatisfied, such as price, features, packaging, technical support, and so on. Then you might look for sentences that refer to those product elements according to each category in a negative light.

5. Next, you'll need to develop translation rules for your codes.

Those rules should be clear and consistent, allowing you to keep track of your data in an organized fashion.

6. After you've determined the terms for which you're searching, your categories, and translation rules, you're ready to code.

You can do so by hand or via software. Software is quite helpful when you have multiple texts. But it also becomes more vital for you to have developed clear codes, categories, and translation rules, especially if you're looking for implicit terms and concepts. Otherwise, your software-driven analysis may miss key instances of the terms you seek.

7. When you have your text coded, it's time to analyze it.

Look for trends and patterns in your results and use them to draw relevant conclusions about your research subject.

  • Conducting a relational analysis

In a relational analysis, you're examining the relationship between different terms that appear in your text(s). To do so requires you to code your texts in a similar fashion as in a relational analysis. However, depending on the type of relational analysis you're trying to conduct, you may need to follow slightly different rules.

Three types of relational analyses are commonly used: affect extraction , proximity analysis , and cognitive mapping .

Affect extraction

This type of relational analysis involves evaluating the different emotional concepts found in a specific text. While the insights from affect extraction can be invaluable, conducting it may prove difficult depending on the text. For example, if the text captures people's emotional states at different times and from different populations, you may find it difficult to compare them and draw appropriate inferences.

Proximity analysis

A relatively simpler analytical approach than affect extraction, proximity analysis assesses the co-occurrence of explicit concepts in a text. You can create what's known as a concept matrix, which is a group of interrelated co-occurring concepts. Concept matrices help evaluate and determine the overall meaning of a text or the identification of a secondary message or theme.

Cognitive mapping

You can use cognitive mapping as a way to visualize the results of either affect extraction or proximity analysis. This technique uses affect extraction or proximity analysis results to create a graphic map illustrating the relationship between co-occurring emotions or concepts.

To conduct a relational analysis, you must start by determining the type of analysis that best fits the study: affect extraction or proximity analysis. 

Complete steps one through six as outlined above. When it comes to the seventh step, analyze the text according to the relational analysis type they've chosen. During this step, feel free to use cognitive mapping to help draw inferences and conclusions about the relationships between co-occurring emotions or concepts. And use other tools, such as mental modeling and decision mapping as necessary, to analyze the results.

  • The advantages of content analysis

Content analysis provides researchers with a robust and inexpensive method to qualitatively and quantitatively analyze a text. By coding the data, you can perform statistical analyses of the data to affirm and reinforce conclusions you may draw. And content analysis can provide helpful insights into language use, behavioral patterns, and historical or cultural conventions that can be valuable beyond the scope of the initial study.

When content analyses are applied to interview data, the approach provides a way to closely analyze data without needing interview-subject interaction, which can be helpful in certain contexts. For example, suppose you want to analyze the perceptions of a group of geographically diverse individuals. In this case, you can conduct a content analysis of existing interview transcripts rather than assuming the time and expense of conducting new interviews.

What is meant by content analysis?

Content analysis is a research method that helps a researcher explore the occurrence of and relationships between various words, phrases, themes, or concepts in a text or set of texts. The method allows researchers in different disciplines to conduct qualitative and quantitative analyses on a variety of texts.

Where is content analysis used?

Content analysis is used in multiple disciplines, as you can use it to evaluate a variety of texts. You can find applications in anthropology, communications, history, linguistics, literary studies, marketing, political science, psychology, and sociology, among other disciplines.

What are the two types of content analysis?

Content analysis may be either conceptual or relational. In a conceptual analysis, researchers examine a text for the presence and frequency of specific words, phrases, themes, and concepts. In a relational analysis, researchers draw inferences and conclusions about the nature of the relationships of co-occurring words, phrases, themes, and concepts in a text.

What's the difference between content analysis and thematic analysis?

Content analysis typically uses a descriptive approach to the data and may use either qualitative or quantitative analytical methods. By contrast, a thematic analysis only uses qualitative methods to explore frequently occurring themes in a text.

Get started today

Go from raw data to valuable insights with a flexible research platform

Editor’s picks

Last updated: 21 December 2023

Last updated: 16 December 2023

Last updated: 6 October 2023

Last updated: 5 March 2024

Last updated: 25 November 2023

Last updated: 15 February 2024

Last updated: 11 March 2024

Last updated: 12 December 2023

Last updated: 6 March 2024

Last updated: 10 April 2023

Last updated: 20 December 2023

Latest articles

Related topics, log in or sign up.

Get started for free

Reference management. Clean and simple.

How to do a content analysis

Content analysis illustration

What is content analysis?

Why would you use a content analysis, types of content analysis, conceptual content analysis, relational content analysis, reliability and validity, reliability, the advantages and disadvantages of content analysis, a step-by-step guide to conducting a content analysis, step 1: develop your research questions, step 2: choose the content you’ll analyze, step 3: identify your biases, step 4: define the units and categories of coding, step 5: develop a coding scheme, step 6: code the content, step 7: analyze the results, frequently asked questions about content analysis, related articles.

In research, content analysis is the process of analyzing content and its features with the aim of identifying patterns and the presence of words, themes, and concepts within the content. Simply put, content analysis is a research method that aims to present the trends, patterns, concepts, and ideas in content as objective, quantitative or qualitative data , depending on the specific use case.

As such, some of the objectives of content analysis include:

  • Simplifying complex, unstructured content.
  • Identifying trends, patterns, and relationships in the content.
  • Determining the characteristics of the content.
  • Identifying the intentions of individuals through the analysis of the content.
  • Identifying the implied aspects in the content.

Typically, when doing a content analysis, you’ll gather data not only from written text sources like newspapers, books, journals, and magazines but also from a variety of other oral and visual sources of content like:

  • Voice recordings, speeches, and interviews.
  • Web content, blogs, and social media content.
  • Films, videos, and photographs.

One of content analysis’s distinguishing features is that you'll be able to gather data for research without physically gathering data from participants. In other words, when doing a content analysis, you don't need to interact with people directly.

The process of doing a content analysis usually involves categorizing or coding concepts, words, and themes within the content and analyzing the results. We’ll look at the process in more detail below.

Typically, you’ll use content analysis when you want to:

  • Identify the intentions, communication trends, or communication patterns of an individual, a group of people, or even an institution.
  • Analyze and describe the behavioral and attitudinal responses of individuals to communications.
  • Determine the emotional or psychological state of an individual or a group of people.
  • Analyze the international differences in communication content.
  • Analyzing audience responses to content.

Keep in mind, though, that these are just some examples of use cases where a content analysis might be appropriate and there are many others.

The key thing to remember is that content analysis will help you quantify the occurrence of specific words, phrases, themes, and concepts in content. Moreover, it can also be used when you want to make qualitative inferences out of the data by analyzing the semantic meanings and interrelationships between words, themes, and concepts.

In general, there are two types of content analysis: conceptual and relational analysis . Although these two types follow largely similar processes, their outcomes differ. As such, each of these types can provide different results, interpretations, and conclusions. With that in mind, let’s now look at these two types of content analysis in more detail.

With conceptual analysis, you’ll determine the existence of certain concepts within the content and identify their frequency. In other words, conceptual analysis involves the number of times a specific concept appears in the content.

Conceptual analysis is typically focused on explicit data, which means you’ll focus your analysis on a specific concept to identify its presence in the content and determine its frequency.

However, when conducting a content analysis, you can also use implicit data. This approach is more involved, complicated, and requires the use of a dictionary, contextual translation rules, or a combination of both.

No matter what type you use, conceptual analysis brings an element of quantitive analysis into a qualitative approach to research.

Relational content analysis takes conceptual analysis a step further. So, while the process starts in the same way by identifying concepts in content, it doesn’t focus on finding the frequency of these concepts, but rather on the relationships between the concepts, the context in which they appear in the content, and their interrelationships.

Before starting with a relational analysis, you’ll first need to decide on which subcategory of relational analysis you’ll use:

  • Affect extraction: With this relational content analysis approach, you’ll evaluate concepts based on their emotional attributes. You’ll typically assess these emotions on a rating scale with higher values assigned to positive emotions and lower values to negative ones. In turn, this allows you to capture the emotions of the writer or speaker at the time the content is created. The main difficulty with this approach is that emotions can differ over time and across populations.
  • Proximity analysis: With this approach, you’ll identify concepts as in conceptual analysis, but you’ll evaluate the way in which they occur together in the content. In other words, proximity analysis allows you to analyze the relationship between concepts and derive a concept matrix from which you’ll be able to develop meaning. Proximity analysis is typically used when you want to extract facts from the content rather than contextual, emotional, or cultural factors.
  • Cognitive mapping: Finally, cognitive mapping can be used with affect extraction or proximity analysis. It’s a visualization technique that allows you to create a model that represents the overall meaning of content and presents it as a graphic map of the relationships between concepts. As such, it’s also commonly used when analyzing the changes in meanings, definitions, and terms over time.

Now that we’ve seen what content analysis is and looked at the different types of content analysis, it’s important to understand how reliable it is as a research method . We’ll also look at what criteria impact the validity of a content analysis.

There are three criteria that determine the reliability of a content analysis:

  • Stability . Stability refers to the tendency of coders to consistently categorize or code the same data in the same way over time.
  • Reproducibility . This criterion refers to the tendency of coders to classify categories membership in the same way.
  • Accuracy . Accuracy refers to the extent to which the classification of content corresponds to a specific standard.

Keep in mind, though, that because you’ll need to code or categorize the concepts you’ll aim to identify and analyze manually, you’ll never be able to eliminate human error. However, you’ll be able to minimize it.

In turn, three criteria determine the validity of a content analysis:

  • Closeness of categories . This is achieved by using multiple classifiers to get an agreed-upon definition for a specific category by using either implicit variables or synonyms. In this way, the category can be broadened to include more relevant data.
  • Conclusions . Here, it’s crucial to decide what level of implication will be allowable. In other words, it’s important to consider whether the conclusions are valid based on the data or whether they can be explained using some other phenomena.
  • Generalizability of the results of the analysis to a theory . Generalizability comes down to how you determine your categories as mentioned above and how reliable those categories are. In turn, this relies on how accurately the categories are at measuring the concepts or ideas that you’re looking to measure.

Considering everything mentioned above, there are definite advantages and disadvantages when it comes to content analysis:

Let’s now look at the steps you’ll need to follow when doing a content analysis.

The first step will always be to formulate your research questions. This is simply because, without clear and defined research questions, you won’t know what question to answer and, by implication, won’t be able to code your concepts.

Based on your research questions, you’ll then need to decide what content you’ll analyze. Here, you’ll use three factors to find the right content:

  • The type of content . Here you’ll need to consider the various types of content you’ll use and their medium like, for example, blog posts, social media, newspapers, or online articles.
  • What criteria you’ll use for inclusion . Here you’ll decide what criteria you’ll use to include content. This can, for instance, be the mentioning of a certain event or advertising a specific product.
  • Your parameters . Here, you’ll decide what content you’ll include based on specified parameters in terms of date and location.

The next step is to consider your own pre-conception of the questions and identify your biases. This process is referred to as bracketing and allows you to be aware of your biases before you start your research with the result that they’ll be less likely to influence the analysis.

Your next step would be to define the units of meaning that you’ll code. This will, for example, be the number of times a concept appears in the content or the treatment of concept, words, or themes in the content. You’ll then need to define the set of categories you’ll use for coding which can be either objective or more conceptual.

Based on the above, you’ll then organize the units of meaning into your defined categories. Apart from this, your coding scheme will also determine how you’ll analyze the data.

The next step is to code the content. During this process, you’ll work through the content and record the data according to your coding scheme. It’s also here where conceptual and relational analysis starts to deviate in relation to the process you’ll need to follow.

As mentioned earlier, conceptual analysis aims to identify the number of times a specific concept, idea, word, or phrase appears in the content. So, here, you’ll need to decide what level of analysis you’ll implement.

In contrast, with relational analysis, you’ll need to decide what type of relational analysis you’ll use. So, you’ll need to determine whether you’ll use affect extraction, proximity analysis, cognitive mapping, or a combination of these approaches.

Once you’ve coded the data, you’ll be able to analyze it and draw conclusions from the data based on your research questions.

Content analysis offers an inexpensive and flexible way to identify trends and patterns in communication content. In addition, it’s unobtrusive which eliminates many ethical concerns and inaccuracies in research data. However, to be most effective, a content analysis must be planned and used carefully in order to ensure reliability and validity.

The two general types of content analysis: conceptual and relational analysis . Although these two types follow largely similar processes, their outcomes differ. As such, each of these types can provide different results, interpretations, and conclusions.

In qualitative research coding means categorizing concepts, words, and themes within your content to create a basis for analyzing the results. While coding, you work through the content and record the data according to your coding scheme.

Content analysis is the process of analyzing content and its features with the aim of identifying patterns and the presence of words, themes, and concepts within the content. The goal of a content analysis is to present the trends, patterns, concepts, and ideas in content as objective, quantitative or qualitative data, depending on the specific use case.

Content analysis is a qualitative method of data analysis and can be used in many different fields. It is particularly popular in the social sciences.

It is possible to do qualitative analysis without coding, but content analysis as a method of qualitative analysis requires coding or categorizing data to then analyze it according to your coding scheme in the next step.

research and analysis of content

Logo for Open Educational Resources

Chapter 17. Content Analysis

Introduction.

Content analysis is a term that is used to mean both a method of data collection and a method of data analysis. Archival and historical works can be the source of content analysis, but so too can the contemporary media coverage of a story, blogs, comment posts, films, cartoons, advertisements, brand packaging, and photographs posted on Instagram or Facebook. Really, almost anything can be the “content” to be analyzed. This is a qualitative research method because the focus is on the meanings and interpretations of that content rather than strictly numerical counts or variables-based causal modeling. [1] Qualitative content analysis (sometimes referred to as QCA) is particularly useful when attempting to define and understand prevalent stories or communication about a topic of interest—in other words, when we are less interested in what particular people (our defined sample) are doing or believing and more interested in what general narratives exist about a particular topic or issue. This chapter will explore different approaches to content analysis and provide helpful tips on how to collect data, how to turn that data into codes for analysis, and how to go about presenting what is found through analysis. It is also a nice segue between our data collection methods (e.g., interviewing, observation) chapters and chapters 18 and 19, whose focus is on coding, the primary means of data analysis for most qualitative data. In many ways, the methods of content analysis are quite similar to the method of coding.

research and analysis of content

Although the body of material (“content”) to be collected and analyzed can be nearly anything, most qualitative content analysis is applied to forms of human communication (e.g., media posts, news stories, campaign speeches, advertising jingles). The point of the analysis is to understand this communication, to systematically and rigorously explore its meanings, assumptions, themes, and patterns. Historical and archival sources may be the subject of content analysis, but there are other ways to analyze (“code”) this data when not overly concerned with the communicative aspect (see chapters 18 and 19). This is why we tend to consider content analysis its own method of data collection as well as a method of data analysis. Still, many of the techniques you learn in this chapter will be helpful to any “coding” scheme you develop for other kinds of qualitative data. Just remember that content analysis is a particular form with distinct aims and goals and traditions.

An Overview of the Content Analysis Process

The first step: selecting content.

Figure 17.2 is a display of possible content for content analysis. The first step in content analysis is making smart decisions about what content you will want to analyze and to clearly connect this content to your research question or general focus of research. Why are you interested in the messages conveyed in this particular content? What will the identification of patterns here help you understand? Content analysis can be fun to do, but in order to make it research, you need to fit it into a research plan.

Figure 17.1. A Non-exhaustive List of "Content" for Content Analysis

To take one example, let us imagine you are interested in gender presentations in society and how presentations of gender have changed over time. There are various forms of content out there that might help you document changes. You could, for example, begin by creating a list of magazines that are coded as being for “women” (e.g., Women’s Daily Journal ) and magazines that are coded as being for “men” (e.g., Men’s Health ). You could then select a date range that is relevant to your research question (e.g., 1950s–1970s) and collect magazines from that era. You might create a “sample” by deciding to look at three issues for each year in the date range and a systematic plan for what to look at in those issues (e.g., advertisements? Cartoons? Titles of articles? Whole articles?). You are not just going to look at some magazines willy-nilly. That would not be systematic enough to allow anyone to replicate or check your findings later on. Once you have a clear plan of what content is of interest to you and what you will be looking at, you can begin, creating a record of everything you are including as your content. This might mean a list of each advertisement you look at or each title of stories in those magazines along with its publication date. You may decide to have multiple “content” in your research plan. For each content, you want a clear plan for collecting, sampling, and documenting.

The Second Step: Collecting and Storing

Once you have a plan, you are ready to collect your data. This may entail downloading from the internet, creating a Word document or PDF of each article or picture, and storing these in a folder designated by the source and date (e.g., “ Men’s Health advertisements, 1950s”). Sølvberg ( 2021 ), for example, collected posted job advertisements for three kinds of elite jobs (economic, cultural, professional) in Sweden. But collecting might also mean going out and taking photographs yourself, as in the case of graffiti, street signs, or even what people are wearing. Chaise LaDousa, an anthropologist and linguist, took photos of “house signs,” which are signs, often creative and sometimes offensive, hung by college students living in communal off-campus houses. These signs were a focal point of college culture, sending messages about the values of the students living in them. Some of the names will give you an idea: “Boot ’n Rally,” “The Plantation,” “Crib of the Rib.” The students might find these signs funny and benign, but LaDousa ( 2011 ) argued convincingly that they also reproduced racial and gender inequalities. The data here already existed—they were big signs on houses—but the researcher had to collect the data by taking photographs.

In some cases, your content will be in physical form but not amenable to photographing, as in the case of films or unwieldy physical artifacts you find in the archives (e.g., undigitized meeting minutes or scrapbooks). In this case, you need to create some kind of detailed log (fieldnotes even) of the content that you can reference. In the case of films, this might mean watching the film and writing down details for key scenes that become your data. [2] For scrapbooks, it might mean taking notes on what you are seeing, quoting key passages, describing colors or presentation style. As you might imagine, this can take a lot of time. Be sure you budget this time into your research plan.

Researcher Note

A note on data scraping : Data scraping, sometimes known as screen scraping or frame grabbing, is a way of extracting data generated by another program, as when a scraping tool grabs information from a website. This may help you collect data that is on the internet, but you need to be ethical in how to employ the scraper. A student once helped me scrape thousands of stories from the Time magazine archives at once (although it took several hours for the scraping process to complete). These stories were freely available, so the scraping process simply sped up the laborious process of copying each article of interest and saving it to my research folder. Scraping tools can sometimes be used to circumvent paywalls. Be careful here!

The Third Step: Analysis

There is often an assumption among novice researchers that once you have collected your data, you are ready to write about what you have found. Actually, you haven’t yet found anything, and if you try to write up your results, you will probably be staring sadly at a blank page. Between the collection and the writing comes the difficult task of systematically and repeatedly reviewing the data in search of patterns and themes that will help you interpret the data, particularly its communicative aspect (e.g., What is it that is being communicated here, with these “house signs” or in the pages of Men’s Health ?).

The first time you go through the data, keep an open mind on what you are seeing (or hearing), and take notes about your observations that link up to your research question. In the beginning, it can be difficult to know what is relevant and what is extraneous. Sometimes, your research question changes based on what emerges from the data. Use the first round of review to consider this possibility, but then commit yourself to following a particular focus or path. If you are looking at how gender gets made or re-created, don’t follow the white rabbit down a hole about environmental injustice unless you decide that this really should be the focus of your study or that issues of environmental injustice are linked to gender presentation. In the second round of review, be very clear about emerging themes and patterns. Create codes (more on these in chapters 18 and 19) that will help you simplify what you are noticing. For example, “men as outdoorsy” might be a common trope you see in advertisements. Whenever you see this, mark the passage or picture. In your third (or fourth or fifth) round of review, begin to link up the tropes you’ve identified, looking for particular patterns and assumptions. You’ve drilled down to the details, and now you are building back up to figure out what they all mean. Start thinking about theory—either theories you have read about and are using as a frame of your study (e.g., gender as performance theory) or theories you are building yourself, as in the Grounded Theory tradition. Once you have a good idea of what is being communicated and how, go back to the data at least one more time to look for disconfirming evidence. Maybe you thought “men as outdoorsy” was of importance, but when you look hard, you note that women are presented as outdoorsy just as often. You just hadn’t paid attention. It is very important, as any kind of researcher but particularly as a qualitative researcher, to test yourself and your emerging interpretations in this way.

The Fourth and Final Step: The Write-Up

Only after you have fully completed analysis, with its many rounds of review and analysis, will you be able to write about what you found. The interpretation exists not in the data but in your analysis of the data. Before writing your results, you will want to very clearly describe how you chose the data here and all the possible limitations of this data (e.g., historical-trace problem or power problem; see chapter 16). Acknowledge any limitations of your sample. Describe the audience for the content, and discuss the implications of this. Once you have done all of this, you can put forth your interpretation of the communication of the content, linking to theory where doing so would help your readers understand your findings and what they mean more generally for our understanding of how the social world works. [3]

Analyzing Content: Helpful Hints and Pointers

Although every data set is unique and each researcher will have a different and unique research question to address with that data set, there are some common practices and conventions. When reviewing your data, what do you look at exactly? How will you know if you have seen a pattern? How do you note or mark your data?

Let’s start with the last question first. If your data is stored digitally, there are various ways you can highlight or mark up passages. You can, of course, do this with literal highlighters, pens, and pencils if you have print copies. But there are also qualitative software programs to help you store the data, retrieve the data, and mark the data. This can simplify the process, although it cannot do the work of analysis for you.

Qualitative software can be very expensive, so the first thing to do is to find out if your institution (or program) has a universal license its students can use. If they do not, most programs have special student licenses that are less expensive. The two most used programs at this moment are probably ATLAS.ti and NVivo. Both can cost more than $500 [4] but provide everything you could possibly need for storing data, content analysis, and coding. They also have a lot of customer support, and you can find many official and unofficial tutorials on how to use the programs’ features on the web. Dedoose, created by academic researchers at UCLA, is a decent program that lacks many of the bells and whistles of the two big programs. Instead of paying all at once, you pay monthly, as you use the program. The monthly fee is relatively affordable (less than $15), so this might be a good option for a small project. HyperRESEARCH is another basic program created by academic researchers, and it is free for small projects (those that have limited cases and material to import). You can pay a monthly fee if your project expands past the free limits. I have personally used all four of these programs, and they each have their pluses and minuses.

Regardless of which program you choose, you should know that none of them will actually do the hard work of analysis for you. They are incredibly useful for helping you store and organize your data, and they provide abundant tools for marking, comparing, and coding your data so you can make sense of it. But making sense of it will always be your job alone.

So let’s say you have some software, and you have uploaded all of your content into the program: video clips, photographs, transcripts of news stories, articles from magazines, even digital copies of college scrapbooks. Now what do you do? What are you looking for? How do you see a pattern? The answers to these questions will depend partially on the particular research question you have, or at least the motivation behind your research. Let’s go back to the idea of looking at gender presentations in magazines from the 1950s to the 1970s. Here are some things you can look at and code in the content: (1) actions and behaviors, (2) events or conditions, (3) activities, (4) strategies and tactics, (5) states or general conditions, (6) meanings or symbols, (7) relationships/interactions, (8) consequences, and (9) settings. Table 17.1 lists these with examples from our gender presentation study.

Table 17.1. Examples of What to Note During Content Analysis

One thing to note about the examples in table 17.1: sometimes we note (mark, record, code) a single example, while other times, as in “settings,” we are recording a recurrent pattern. To help you spot patterns, it is useful to mark every setting, including a notation on gender. Using software can help you do this efficiently. You can then call up “setting by gender” and note this emerging pattern. There’s an element of counting here, which we normally think of as quantitative data analysis, but we are using the count to identify a pattern that will be used to help us interpret the communication. Content analyses often include counting as part of the interpretive (qualitative) process.

In your own study, you may not need or want to look at all of the elements listed in table 17.1. Even in our imagined example, some are more useful than others. For example, “strategies and tactics” is a bit of a stretch here. In studies that are looking specifically at, say, policy implementation or social movements, this category will prove much more salient.

Another way to think about “what to look at” is to consider aspects of your content in terms of units of analysis. You can drill down to the specific words used (e.g., the adjectives commonly used to describe “men” and “women” in your magazine sample) or move up to the more abstract level of concepts used (e.g., the idea that men are more rational than women). Counting for the purpose of identifying patterns is particularly useful here. How many times is that idea of women’s irrationality communicated? How is it is communicated (in comic strips, fictional stories, editorials, etc.)? Does the incidence of the concept change over time? Perhaps the “irrational woman” was everywhere in the 1950s, but by the 1970s, it is no longer showing up in stories and comics. By tracing its usage and prevalence over time, you might come up with a theory or story about gender presentation during the period. Table 17.2 provides more examples of using different units of analysis for this work along with suggestions for effective use.

Table 17.2. Examples of Unit of Analysis in Content Analysis

Every qualitative content analysis is unique in its particular focus and particular data used, so there is no single correct way to approach analysis. You should have a better idea, however, of what kinds of things to look for and what to look for. The next two chapters will take you further into the coding process, the primary analytical tool for qualitative research in general.

Further Readings

Cidell, Julie. 2010. “Content Clouds as Exploratory Qualitative Data Analysis.” Area 42(4):514–523. A demonstration of using visual “content clouds” as a form of exploratory qualitative data analysis using transcripts of public meetings and content of newspaper articles.

Hsieh, Hsiu-Fang, and Sarah E. Shannon. 2005. “Three Approaches to Qualitative Content Analysis.” Qualitative Health Research 15(9):1277–1288. Distinguishes three distinct approaches to QCA: conventional, directed, and summative. Uses hypothetical examples from end-of-life care research.

Jackson, Romeo, Alex C. Lange, and Antonio Duran. 2021. “A Whitened Rainbow: The In/Visibility of Race and Racism in LGBTQ Higher Education Scholarship.” Journal Committed to Social Change on Race and Ethnicity (JCSCORE) 7(2):174–206.* Using a “critical summative content analysis” approach, examines research published on LGBTQ people between 2009 and 2019.

Krippendorff, Klaus. 2018. Content Analysis: An Introduction to Its Methodology . 4th ed. Thousand Oaks, CA: SAGE. A very comprehensive textbook on both quantitative and qualitative forms of content analysis.

Mayring, Philipp. 2022. Qualitative Content Analysis: A Step-by-Step Guide . Thousand Oaks, CA: SAGE. Formulates an eight-step approach to QCA.

Messinger, Adam M. 2012. “Teaching Content Analysis through ‘Harry Potter.’” Teaching Sociology 40(4):360–367. This is a fun example of a relatively brief foray into content analysis using the music found in Harry Potter films.

Neuendorft, Kimberly A. 2002. The Content Analysis Guidebook . Thousand Oaks, CA: SAGE. Although a helpful guide to content analysis in general, be warned that this textbook definitely favors quantitative over qualitative approaches to content analysis.

Schrier, Margrit. 2012. Qualitative Content Analysis in Practice . Thousand Okas, CA: SAGE. Arguably the most accessible guidebook for QCA, written by a professor based in Germany.

Weber, Matthew A., Shannon Caplan, Paul Ringold, and Karen Blocksom. 2017. “Rivers and Streams in the Media: A Content Analysis of Ecosystem Services.” Ecology and Society 22(3).* Examines the content of a blog hosted by National Geographic and articles published in The New York Times and the Wall Street Journal for stories on rivers and streams (e.g., water-quality flooding).

  • There are ways of handling content analysis quantitatively, however. Some practitioners therefore specify qualitative content analysis (QCA). In this chapter, all content analysis is QCA unless otherwise noted. ↵
  • Note that some qualitative software allows you to upload whole films or film clips for coding. You will still have to get access to the film, of course. ↵
  • See chapter 20 for more on the final presentation of research. ↵
  • . Actually, ATLAS.ti is an annual license, while NVivo is a perpetual license, but both are going to cost you at least $500 to use. Student rates may be lower. And don’t forget to ask your institution or program if they already have a software license you can use. ↵

A method of both data collection and data analysis in which a given content (textual, visual, graphic) is examined systematically and rigorously to identify meanings, themes, patterns and assumptions.  Qualitative content analysis (QCA) is concerned with gathering and interpreting an existing body of material.    

Introduction to Qualitative Research Methods Copyright © 2023 by Allison Hurst is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License , except where otherwise noted.

Grad Coach

What Is Qualitative Content Analysis?

Qca explained simply (with examples).

By: Jenna Crosley (PhD). Reviewed by: Dr Eunice Rautenbach (DTech) | February 2021

If you’re in the process of preparing for your dissertation, thesis or research project, you’ve probably encountered the term “ qualitative content analysis ” – it’s quite a mouthful. If you’ve landed on this post, you’re probably a bit confused about it. Well, the good news is that you’ve come to the right place…

Overview: Qualitative Content Analysis

  • What (exactly) is qualitative content analysis
  • The two main types of content analysis
  • When to use content analysis
  • How to conduct content analysis (the process)
  • The advantages and disadvantages of content analysis

1. What is content analysis?

Content analysis is a  qualitative analysis method  that focuses on recorded human artefacts such as manuscripts, voice recordings and journals. Content analysis investigates these written, spoken and visual artefacts without explicitly extracting data from participants – this is called  unobtrusive  research.

In other words, with content analysis, you don’t necessarily need to interact with participants (although you can if necessary); you can simply analyse the data that they have already produced. With this type of analysis, you can analyse data such as text messages, books, Facebook posts, videos, and audio (just to mention a few).

The basics – explicit and implicit content

When working with content analysis, explicit and implicit content will play a role. Explicit data is transparent and easy to identify, while implicit data is that which requires some form of interpretation and is often of a subjective nature. Sounds a bit fluffy? Here’s an example:

Joe: Hi there, what can I help you with? 

Lauren: I recently adopted a puppy and I’m worried that I’m not feeding him the right food. Could you please advise me on what I should be feeding? 

Joe: Sure, just follow me and I’ll show you. Do you have any other pets?

Lauren: Only one, and it tweets a lot!

In this exchange, the explicit data indicates that Joe is helping Lauren to find the right puppy food. Lauren asks Joe whether she has any pets aside from her puppy. This data is explicit because it requires no interpretation.

On the other hand, implicit data , in this case, includes the fact that the speakers are in a pet store. This information is not clearly stated but can be inferred from the conversation, where Joe is helping Lauren to choose pet food. An additional piece of implicit data is that Lauren likely has some type of bird as a pet. This can be inferred from the way that Lauren states that her pet “tweets”.

As you can see, explicit and implicit data both play a role in human interaction  and are an important part of your analysis. However, it’s important to differentiate between these two types of data when you’re undertaking content analysis. Interpreting implicit data can be rather subjective as conclusions are based on the researcher’s interpretation. This can introduce an element of bias , which risks skewing your results.

Explicit and implicit data both play an important role in your content analysis, but it’s important to differentiate between them.

2. The two types of content analysis

Now that you understand the difference between implicit and explicit data, let’s move on to the two general types of content analysis : conceptual and relational content analysis. Importantly, while conceptual and relational content analysis both follow similar steps initially, the aims and outcomes of each are different.

Conceptual analysis focuses on the number of times a concept occurs in a set of data and is generally focused on explicit data. For example, if you were to have the following conversation:

Marie: She told me that she has three cats.

Jean: What are her cats’ names?

Marie: I think the first one is Bella, the second one is Mia, and… I can’t remember the third cat’s name.

In this data, you can see that the word “cat” has been used three times. Through conceptual content analysis, you can deduce that cats are the central topic of the conversation. You can also perform a frequency analysis , where you assess the term’s frequency in the data. For example, in the exchange above, the word “cat” makes up 9% of the data. In other words, conceptual analysis brings a little bit of quantitative analysis into your qualitative analysis.

As you can see, the above data is without interpretation and focuses on explicit data . Relational content analysis, on the other hand, takes a more holistic view by focusing more on implicit data in terms of context, surrounding words and relationships.

There are three types of relational analysis:

  • Affect extraction
  • Proximity analysis
  • Cognitive mapping

Affect extraction is when you assess concepts according to emotional attributes. These emotions are typically mapped on scales, such as a Likert scale or a rating scale ranging from 1 to 5, where 1 is “very sad” and 5 is “very happy”.

If participants are talking about their achievements, they are likely to be given a score of 4 or 5, depending on how good they feel about it. If a participant is describing a traumatic event, they are likely to have a much lower score, either 1 or 2.

Proximity analysis identifies explicit terms (such as those found in a conceptual analysis) and the patterns in terms of how they co-occur in a text. In other words, proximity analysis investigates the relationship between terms and aims to group these to extract themes and develop meaning.

Proximity analysis is typically utilised when you’re looking for hard facts rather than emotional, cultural, or contextual factors. For example, if you were to analyse a political speech, you may want to focus only on what has been said, rather than implications or hidden meanings. To do this, you would make use of explicit data, discounting any underlying meanings and implications of the speech.

Lastly, there’s cognitive mapping, which can be used in addition to, or along with, proximity analysis. Cognitive mapping involves taking different texts and comparing them in a visual format – i.e. a cognitive map. Typically, you’d use cognitive mapping in studies that assess changes in terms, definitions, and meanings over time. It can also serve as a way to visualise affect extraction or proximity analysis and is often presented in a form such as a graphic map.

Example of a cognitive map

To recap on the essentials, content analysis is a qualitative analysis method that focuses on recorded human artefacts . It involves both conceptual analysis (which is more numbers-based) and relational analysis (which focuses on the relationships between concepts and how they’re connected).

Need a helping hand?

research and analysis of content

3. When should you use content analysis?

Content analysis is a useful tool that provides insight into trends of communication . For example, you could use a discussion forum as the basis of your analysis and look at the types of things the members talk about as well as how they use language to express themselves. Content analysis is flexible in that it can be applied to the individual, group, and institutional level.

Content analysis is typically used in studies where the aim is to better understand factors such as behaviours, attitudes, values, emotions, and opinions . For example, you could use content analysis to investigate an issue in society, such as miscommunication between cultures. In this example, you could compare patterns of communication in participants from different cultures, which will allow you to create strategies for avoiding misunderstandings in intercultural interactions.

Another example could include conducting content analysis on a publication such as a book. Here you could gather data on the themes, topics, language use and opinions reflected in the text to draw conclusions regarding the political (such as conservative or liberal) leanings of the publication.

Content analysis is typically used in projects where the research aims involve getting a better understanding of factors such as behaviours, attitudes, values, emotions, and opinions.

4. How to conduct a qualitative content analysis

Conceptual and relational content analysis differ in terms of their exact process ; however, there are some similarities. Let’s have a look at these first – i.e., the generic process:

  • Recap on your research questions
  • Undertake bracketing to identify biases
  • Operationalise your variables and develop a coding scheme
  • Code the data and undertake your analysis

Step 1 – Recap on your research questions

It’s always useful to begin a project with research questions , or at least with an idea of what you are looking for. In fact, if you’ve spent time reading this blog, you’ll know that it’s useful to recap on your research questions, aims and objectives when undertaking pretty much any research activity. In the context of content analysis, it’s difficult to know what needs to be coded and what doesn’t, without a clear view of the research questions.

For example, if you were to code a conversation focused on basic issues of social justice, you may be met with a wide range of topics that may be irrelevant to your research. However, if you approach this data set with the specific intent of investigating opinions on gender issues, you will be able to focus on this topic alone, which would allow you to code only what you need to investigate.

With content analysis, it’s difficult to know what needs to be coded  without a clear view of the research questions.

Step 2 – Reflect on your personal perspectives and biases

It’s vital that you reflect on your own pre-conception of the topic at hand and identify the biases that you might drag into your content analysis – this is called “ bracketing “. By identifying this upfront, you’ll be more aware of them and less likely to have them subconsciously influence your analysis.

For example, if you were to investigate how a community converses about unequal access to healthcare, it is important to assess your views to ensure that you don’t project these onto your understanding of the opinions put forth by the community. If you have access to medical aid, for instance, you should not allow this to interfere with your examination of unequal access.

You must reflect on the preconceptions and biases that you might drag into your content analysis - this is called "bracketing".

Step 3 – Operationalise your variables and develop a coding scheme

Next, you need to operationalise your variables . But what does that mean? Simply put, it means that you have to define each variable or construct . Give every item a clear definition – what does it mean (include) and what does it not mean (exclude). For example, if you were to investigate children’s views on healthy foods, you would first need to define what age group/range you’re looking at, and then also define what you mean by “healthy foods”.

In combination with the above, it is important to create a coding scheme , which will consist of information about your variables (how you defined each variable), as well as a process for analysing the data. For this, you would refer back to how you operationalised/defined your variables so that you know how to code your data.

For example, when coding, when should you code a food as “healthy”? What makes a food choice healthy? Is it the absence of sugar or saturated fat? Is it the presence of fibre and protein? It’s very important to have clearly defined variables to achieve consistent coding – without this, your analysis will get very muddy, very quickly.

When operationalising your variables, you must give every item a clear definition. In other words, what does it mean (include) and what does it not mean (exclude).

Step 4 – Code and analyse the data

The next step is to code the data. At this stage, there are some differences between conceptual and relational analysis.

As described earlier in this post, conceptual analysis looks at the existence and frequency of concepts, whereas a relational analysis looks at the relationships between concepts. For both types of analyses, it is important to pre-select a concept that you wish to assess in your data. Using the example of studying children’s views on healthy food, you could pre-select the concept of “healthy food” and assess the number of times the concept pops up in your data.

Here is where conceptual and relational analysis start to differ.

At this stage of conceptual analysis , it is necessary to decide on the level of analysis you’ll perform on your data, and whether this will exist on the word, phrase, sentence, or thematic level. For example, will you code the phrase “healthy food” on its own? Will you code each term relating to healthy food (e.g., broccoli, peaches, bananas, etc.) with the code “healthy food” or will these be coded individually? It is very important to establish this from the get-go to avoid inconsistencies that could result in you having to code your data all over again.

On the other hand, relational analysis looks at the type of analysis. So, will you use affect extraction? Proximity analysis? Cognitive mapping? A mix? It’s vital to determine the type of analysis before you begin to code your data so that you can maintain the reliability and validity of your research .

research and analysis of content

How to conduct conceptual analysis

First, let’s have a look at the process for conceptual analysis.

Once you’ve decided on your level of analysis, you need to establish how you will code your concepts, and how many of these you want to code. Here you can choose whether you want to code in a deductive or inductive manner. Just to recap, deductive coding is when you begin the coding process with a set of pre-determined codes, whereas inductive coding entails the codes emerging as you progress with the coding process. Here it is also important to decide what should be included and excluded from your analysis, and also what levels of implication you wish to include in your codes.

For example, if you have the concept of “tall”, can you include “up in the clouds”, derived from the sentence, “the giraffe’s head is up in the clouds” in the code, or should it be a separate code? In addition to this, you need to know what levels of words may be included in your codes or not. For example, if you say, “the panda is cute” and “look at the panda’s cuteness”, can “cute” and “cuteness” be included under the same code?

Once you’ve considered the above, it’s time to code the text . We’ve already published a detailed post about coding , so we won’t go into that process here. Once you’re done coding, you can move on to analysing your results. This is where you will aim to find generalisations in your data, and thus draw your conclusions .

How to conduct relational analysis

Now let’s return to relational analysis.

As mentioned, you want to look at the relationships between concepts . To do this, you’ll need to create categories by reducing your data (in other words, grouping similar concepts together) and then also code for words and/or patterns. These are both done with the aim of discovering whether these words exist, and if they do, what they mean.

Your next step is to assess your data and to code the relationships between your terms and meanings, so that you can move on to your final step, which is to sum up and analyse the data.

To recap, it’s important to start your analysis process by reviewing your research questions and identifying your biases . From there, you need to operationalise your variables, code your data and then analyse it.

Time to analyse

5. What are the pros & cons of content analysis?

One of the main advantages of content analysis is that it allows you to use a mix of quantitative and qualitative research methods, which results in a more scientifically rigorous analysis.

For example, with conceptual analysis, you can count the number of times that a term or a code appears in a dataset, which can be assessed from a quantitative standpoint. In addition to this, you can then use a qualitative approach to investigate the underlying meanings of these and relationships between them.

Content analysis is also unobtrusive and therefore poses fewer ethical issues than some other analysis methods. As the content you’ll analyse oftentimes already exists, you’ll analyse what has been produced previously, and so you won’t have to collect data directly from participants. When coded correctly, data is analysed in a very systematic and transparent manner, which means that issues of replicability (how possible it is to recreate research under the same conditions) are reduced greatly.

On the downside , qualitative research (in general, not just content analysis) is often critiqued for being too subjective and for not being scientifically rigorous enough. This is where reliability (how replicable a study is by other researchers) and validity (how suitable the research design is for the topic being investigated) come into play – if you take these into account, you’ll be on your way to achieving sound research results.

One of the main advantages of content analysis is that it allows you to use a mix of quantitative and qualitative research methods, which results in a more scientifically rigorous analysis.

Recap: Qualitative content analysis

In this post, we’ve covered a lot of ground – click on any of the sections to recap:

If you have any questions about qualitative content analysis, feel free to leave a comment below. If you’d like 1-on-1 help with your qualitative content analysis, be sure to book an initial consultation with one of our friendly Research Coaches.

research and analysis of content

Psst… there’s more (for free)

This post is part of our dissertation mini-course, which covers everything you need to get started with your dissertation, thesis or research project. 

You Might Also Like:

Narrative analysis explainer

14 Comments

Abhishek

If I am having three pre-decided attributes for my research based on which a set of semi-structured questions where asked then should I conduct a conceptual content analysis or relational content analysis. please note that all three attributes are different like Agility, Resilience and AI.

Ofori Henry Affum

Thank you very much. I really enjoyed every word.

Janak Raj Bhatta

please send me one/ two sample of content analysis

pravin

send me to any sample of qualitative content analysis as soon as possible

abdellatif djedei

Many thanks for the brilliant explanation. Do you have a sample practical study of a foreign policy using content analysis?

DR. TAPAS GHOSHAL

1) It will be very much useful if a small but complete content analysis can be sent, from research question to coding and analysis. 2) Is there any software by which qualitative content analysis can be done?

Carkanirta

Common software for qualitative analysis is nVivo, and quantitative analysis is IBM SPSS

carmely

Thank you. Can I have at least 2 copies of a sample analysis study as my reference?

Yang

Could you please send me some sample of textbook content analysis?

Abdoulie Nyassi

Can I send you my research topic, aims, objectives and questions to give me feedback on them?

Bobby Benjamin Simeon

please could you send me samples of content analysis?

Obi Clara Chisom

Yes please send

Gaid Ahmed

really we enjoyed your knowledge thanks allot. from Ethiopia

Ary

can you please share some samples of content analysis(relational)? I am a bit confused about processing the analysis part

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Content Analysis

  • First Online: 02 January 2023

Cite this chapter

Book cover

  • Scott Tunison 4  

Part of the book series: Springer Texts in Education ((SPTE))

4053 Accesses

1 Citations

Content analysis emerged from studies of archived texts (Vogt et al., When to use what research design, Guilford Press, 2012), such as newspapers, transcripts of speeches, and magazines. (Ellingson, The SAGE handbook of qualitative research (4th ed., pp. 595–610). Sage, 2011.) noted that content analysis resides in the postpositivist typology which allows researchers to “conduct an inductive analysis of textual data, form a typology grounded in the data … use the derived typology to sort data into categories, and then count the frequencies of each theme or category across data” (p. 596).

Download chapter PDF

14.1 Brief History of Content Analysis

Content analysis emerged from studies of archived texts (Vogt et al., 2012 ), such as newspapers, transcripts of speeches, and magazines. Ellingson ( 2011 ) noted that content analysis resides in the postpositivist typology which allows researchers to “conduct an inductive analysis of textual data, form a typology grounded in the data … use the derived typology to sort data into categories, and then count the frequencies of each theme or category across data” (p. 596).

As is the case with so many aspects of the academic research enterprise, the method of content analysis is not without controversy. For example, Creswell ( 2011 ) argued that content analysis is a quantitative method misappropriated by mixed-methods researchers as a qualitative process. He observed that content analysis is “a quantitative procedure involving the collection of qualitative data and its transformation and analysis by quantitative means” (p. 278). Similarly, Vogt et al. ( 2012 ) opined that content analysis involves “conversion of texts into quantitative data, through methods such as determining the frequency of words or phrases or characterising relationships among words and phrases in texts” (p. 338). Furthermore, “some [researchers and critics] are insistent that content analysis refers only to computer-assisted coding and analysis of text” (Vogt et al., p. 338). Cohen et al., ( 2018 ) drew on the work of of a plethora of content analysis advocates to observed that “qualitative content analysis defines a strict and systematic set of procedures for the rigourous analysis, examination, replication, inference, and verification of the contents of written data” (p. 674).

14.2 The Method of Content Analysis

A survey of the literature concerning content analysis reveals multiple typologies and approaches to the method. For instance, Cohen and colleagues averred that content analysis “takes texts and analyses, reduces and interrogates them into summary form through the use of both pre-existing categories and emergent themes … us[ing] systematic, replicable, observable and rule-governed forms of analysis in a theory-dependent system” ( 2018 , p. 675). Newby ( 2010 ) identified three types of content analysis: conventional content analysis (using emergent coding), directed content analysis (using pre-determined coding), and summative content analysis (using predetermined keywords to generate emergent codes). Krippendorp (in Cohen et al., 2018 ) argued that content analysis: (i) describes the appreciable features of textual communication (i.e., asking who is saying what to whom and how it was said); (ii) deduces the precursors of the communication (i.e., the reasons for, the purposes behind, and the context for the communication); and (iii) examines the repercussions of the communication (i.e., its effects). Finally, Krippendrop commented that “content analysis is most successful when it can break down ‘linguistically constituted facts’ into four classes: attributions, social relationships, public behaviours and institutional realities” (in Cohen et al., p. 675).

14.3 What Does Content Analysis Look like?

Simply stated, the qualitative part of content analysis starts with bodies of text; sets linguistic units of analysis (e.g., words, phrases, sentences, paragraphs) and categories for those units; pores over the texts to code and categorize them; and both tallies and documents the frequency of occurrence of whatever linguistic unit was selected. Anderson and Arsenault ( 1998 ), put it more simply, “content analysis involves counting concepts, words, or occurrences in documents and reporting them in tabular form” (p. 102). Broadly, though, according to Cohen and colleagues, content analysis has three primary processes: “breaking down text into units of analysis, undertaking statistical analysis of the units, and presenting the analyses in as economical a form as possible” (p. 675).

Perhaps not surprisingly, given the controversy surrounding the status of content analysis as a method, there is also a veritable cornucopia of process frameworks from which to choose. It is beyond the intent of this chapter to survey them all; thus, just one (the one which the author uses most frequently) is presented here – not as an exemplar for conducting content analysis; but, merely as a trail head for one who may be considering incorporating content analysis into their research designs. To this end, Cohen et al. ( 2018 ) distill from the field an 11-step content analysis method described in brief below.

Step 1—Define the research questions

Ground the focus of the analysis of content in the literature/theory informing the data collection efforts.

Consider the purpose of both the research in general as well as the information required to proceed with the next steps of the research.

Step 2—Define the population

The text to be analysed and, therefore, the collection of text informing the study defines the “population” to be sampled.

Step 3—Define the sample

Sampling strategies typical for human research apply to the selection of text required to inform content analysis research:

Probability or non-probability sample;

Stratified (including the specific strata to be used), random, convenience, purposive, domain, cluster, systematic, time, snowball sampling; and

Types of text/media—interview transcripts, open-ended survey comments, newspapers, journal/magazine articles, television/radio recordings or transcripts, social media texts.

Issues informing sampling techniques—such as representativeness, validity, reliability, size of sample—used in other research methods also apply to content analysis research.

Step 4—Define or clarify media context

Document the context from which the text is derived. For example:

The source of the text,

The purpose, setting, and audience for which it was originally produced,

Translation from original production to present format (e.g., transcription processes, authenticity), and

Sources of information upon which the text was produced and level of abridgement.

Step 5—Define unit of analysis

Unit of analysis considerations take many possible forms. Word, phrase, sentence, paragraph, the entire text as a holistic unit.

A critical concept is to ensure that the units chosen are as discrete as possible to allow valid and reliable analysis.

Step 6 – Define codes

Read text multiple times to become thoroughly familiar with them.

Identify patterns, inconsistencies, contradictions, differences among/between groups of ideas and/or sources.

Step 7—Build analysis categories

As the codes are defined and drawn from the texts, look for opportunities to organise them into coherent groups for analysis.

These groups may well overlap as some codes may apply to more than one category.

Step 8—Implement coding and categorisation

Once the analysis codes and categories are defined, implement the process of applying the codes and categorisation processes chosen to generate the data for analysis.

Step 9—Analyse the data

As the codes are applied and categories have been created, the data are now analysed.

Associations between and among codes and categories are powerful places to start in the analysis process but the focus of the analysis ought to be driven by the purpose of the research and the research questions defined at the start of the project.

Step 10—Summarising

Describe the results of the analysis.

Narrow the focus of the analyses to answer the research questions and extend theory.

Identify areas requiring further research, analysis, or categorisation.

Step 11—Make speculative inferences

Pose possible interpretations, explanations, extrapolations, etc.

Formulate hypotheses and/or theories.

14.4 Strengths and Limitations of Content Analysis

As with any method, content analysis has tremendous strengths and, at the same time, the method is subject to multiple limitations.

Given text preparation processes (e.g., transcriptions, implementing sampling strategies, multiple readings of text) content analysis is time-consuming (Cohen et al., 2018 ).

Reducing some source materials to text through transcription procedures for analysis risks losing or missing the nuances embedded in many forms of communication (Robson, 2002 ).

The method is subject to researcher bias as the coding and categorisation processes may be influenced—both wittingly and unwittingly—by what the researcher may have been expecting to find; rather than what was there to be found (Ezzy, 2002 ).

The method also has numerous strengths:

Since the method retains the data in its original form, it is unobtrusive; it is possible to re-analyse and verify findings (Cohen et al., 2018 ).

Content Analysis is a structured and demonstrable approach to drawing meaning from media in context; making explicit the rules, assumptions, and decisions used in coding and analysis (Mayring, 2004 ).

Engagement Activities

With a partner(s), select an extended text for analysis and identify a research question that might reasonably be addressed within that text. Then, as individuals, follow Cohen et al.’s content analysis steps 5 through 10 to draw as much information as possible from the text. Then, as partner(s), compare and contrast the “results” from your individual analyses of the text. Pay particular attention to each person’s rationale for their analytic and interpretation decisions. What implications do the differences and similarities have for the practice of content analysis and the conclusions one may be able to draw from such research?

Brainstorm a list of topics that interest you and might be appropriate for content analysis. Select two topics from the list and, for each one, describe:

Potential sources of text that might contain data relevant to your research questions;

The strategies you would use to mitigate the potential impact of researcher bias on the factors that you may examine and the ways in which you might observe them;

The context/setting for each project and the ways in which you could ensure appropriate ethical considerations such as informed consent and participant beneficence; and

The tools, structures, and approaches you could use to collect data via content analysis.

Anderson, G., & Arsenault, N. (1998). Fundamentals of educational research (2nd ed.). Routledge.

Google Scholar  

Cohen, L., Manion, L., & Morrison, K. (2018). Research methods in education (8th ed.). Routledge.

Creswell, J. (2011). Controversies in mixed methods research. In N. Denzin & Y. Lincoln (Eds.), The SAGE Handbook of Qualitative Research (4th ed., pp. 269–283). Sage.

Ellingson, L. (2011). Analysis and representation across the continuum. In N. Denzin & Y. Lincoln (Eds.), The SAGE Handbook of Qualitative Research (4th ed., pp. 595–610). Sage.

Ezzy, D. (2002). Qualitative analysis: Practice and innovation . Routledge.

Mayring, P. (2004). Qualitative content analysis. In U. Flick, E. von Kardoff, & I. Steinke (Eds.), A companion to qualitative research . Sage.

Newby, P. (2010). Research methods for education . Pearson Education.

Robson, C. (2002). Real world research (2nd ed.). Blackwell.

Vogt, W. P., Gardner, D., & Haeffele, L. (2012). When to use what research design . Guilford Press.

Online Resources

Adu, P. (2016). Qualitative analysis: Coding and categorizing data . https://youtu.be/v_mg7OBpb2Y

Duke University—Mod-U (2016). How to know you are coding correctly: Qualitative research methods . https://youtu.be/iL7Ww5kpnIM

Gramenz, G. (2014). How to code a document and create themes . https://youtu.be/sHv3RzKWNcQ

Shaw, A. (2019). NVivo 12 and thematic/content analysis . https://youtu.be/5s9-rg1ygWs

Download references

Author information

Authors and affiliations.

University of Saskatchewan, Saskatoon, Canada

Scott Tunison

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Scott Tunison .

Editor information

Editors and affiliations.

Department of Educational Administration, College of Education, University of Saskatchewan, Saskatoon, SK, Canada

Janet Mola Okoko

Department of Educational Administration, University of Saskatchewan, Saskatoon, SK, Canada

Keith D. Walker

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Tunison, S. (2023). Content Analysis. In: Okoko, J.M., Tunison, S., Walker, K.D. (eds) Varieties of Qualitative Research Methods. Springer Texts in Education. Springer, Cham. https://doi.org/10.1007/978-3-031-04394-9_14

Download citation

DOI : https://doi.org/10.1007/978-3-031-04394-9_14

Published : 02 January 2023

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-04396-3

Online ISBN : 978-3-031-04394-9

eBook Packages : Education Education (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Am J Pharm Educ
  • v.84(1); 2020 Jan

Demystifying Content Analysis

A. j. kleinheksel.

a The Medical College of Georgia at Augusta University, Augusta, Georgia

Nicole Rockich-Winston

Huda tawfik.

b Central Michigan University, College of Medicine, Mt. Pleasant, Michigan

Tasha R. Wyatt

Objective. In the course of daily teaching responsibilities, pharmacy educators collect rich data that can provide valuable insight into student learning. This article describes the qualitative data analysis method of content analysis, which can be useful to pharmacy educators because of its application in the investigation of a wide variety of data sources, including textual, visual, and audio files.

Findings. Both manifest and latent content analysis approaches are described, with several examples used to illustrate the processes. This article also offers insights into the variety of relevant terms and visualizations found in the content analysis literature. Finally, common threats to the reliability and validity of content analysis are discussed, along with suitable strategies to mitigate these risks during analysis.

Summary. This review of content analysis as a qualitative data analysis method will provide clarity and actionable instruction for both novice and experienced pharmacy education researchers.

INTRODUCTION

The Academy’s growing interest in qualitative research indicates an important shift in the field’s scientific paradigm. Whereas health science researchers have historically looked to quantitative methods to answer their questions, this shift signals that a purely positivist, objective approach is no longer sufficient to answer pharmacy education’s research questions. Educators who want to study their teaching and students’ learning will find content analysis an easily accessible, robust method of qualitative data analysis that can yield rigorous results for both publication and the improvement of their educational practice. Content analysis is a method designed to identify and interpret meaning in recorded forms of communication by isolating small pieces of the data that represent salient concepts and then applying or creating a framework to organize the pieces in a way that can be used to describe or explain a phenomenon. 1 Content analysis is particularly useful in situations where there is a large amount of unanalyzed textual data, such as those many pharmacy educators have already collected as part of their teaching practice. Because of its accessibility, content analysis is also an appropriate qualitative method for pharmacy educators with limited experience in educational research. This article will introduce and illustrate the process of content analysis as a way to analyze existing data, but also as an approach that may lead pharmacy educators to ask new types of research questions.

Content analysis is a well-established data analysis method that has evolved in its treatment of textual data. Content analysis was originally introduced as a strictly quantitative method, recording counts to measure the observed frequency of pre-identified targets in consumer research. 1 However, as the naturalistic qualitative paradigm became more prevalent in social sciences research and researchers became increasingly interested in the way people behave in natural settings, the process of content analysis was adapted into a more interesting and meaningful approach. Content analysis has the potential to be a useful method in pharmacy education because it can help educational researchers develop a deeper understanding of a particular phenomenon by providing structure in a large amount of textual data through a systematic process of interpretation. It also offers potential value because it can help identify problematic areas in student understanding and guide the process of targeted teaching. Several research studies in pharmacy education have used the method of content analysis. 2-7 Two studies in particular offer noteworthy examples: Wallman and colleagues employed manifest content analysis to analyze semi-structured interviews in order to explore what students learn during experiential rotations, 7 while Moser and colleagues adopted latent content analysis to evaluate open-ended survey responses on student perceptions of learning communities. 6 To elaborate on these approaches further, we will describe the two types of qualitative content analysis, manifest and latent, and demonstrate the corresponding analytical processes using examples that illustrate their benefit.

Qualitative Content Analysis

Content analysis rests on the assumption that texts are a rich data source with great potential to reveal valuable information about particular phenomena. 8 It is the process of considering both the participant and context when sorting text into groups of related categories to identify similarities and differences, patterns, and associations, both on the surface and implied within. 9-11 The method is considered high-yield in educational research because it is versatile and can be applied in both qualitative and quantitative studies. 12 While it is important to note that content analysis has application in visual and auditory artifacts (eg, an image or song), for our purposes we will largely focus on the most common application, which is the analysis of textual or transcribed content (eg, open-ended survey responses, print media, interviews, recorded observations, etc). The terminology of content analysis can vary throughout quantitative and qualitative literature, which may lead to some confusion among both novice and experienced researchers. However, there are also several agreed-upon terms and phrases that span the literature, as found in Table 1 .

Terms and Definitions Used in Qualitative Content Analysis

An external file that holds a picture, illustration, etc.
Object name is ajpe7113-t1.jpg

There is more often disagreement on terminology in the methodological approaches to content analysis, though the most common differentiation is between the two types of content: manifest and latent. In much of the literature, manifest content analysis is defined as describing what is occurring on the surface, what is and literally present, and as “staying close to the text.” 8,13 Manifest content analysis is concerned with data that are easily observable both to researchers and the coders who assist in their analyses, without the need to discern intent or identify deeper meaning. It is content that can be recognized and counted with little training. Early applications of manifest analysis focused on identifying easily observable targets within text (eg, the number of instances a certain word appears in newspaper articles), film (eg, the occupation of a character), or interpersonal interactions (eg, tracking the number of times a participant blinks during an interview). 14 This application, in which frequency counts are used to understand a phenomenon, reflects a surface-level analysis and assumes there is objective truth in the data that can be revealed with very little interpretation. The number of times a target (ie, code) appears within the text is used as a way to understand its prevalence. Quantitative content analysis is always describing a positivist manifest content analysis, in that the nature of truth is believed to be objective, observable, and measurable. Qualitative research, which favors the researcher’s interpretation of an individual’s experience, may also be used to analyze manifest content. However, the intent of the application is to describe a dynamic reality that cannot be separated from the lived experiences of the researcher. Although qualitative content analysis can be conducted whether knowledge is thought to be innate, acquired, or socially constructed, the purpose of qualitative manifest content analysis is to transcend simple word counts and delve into a deeper examination of the language in order to organize large amounts of text into categories that reflect a shared meaning. 15,16 The practical distinction between quantitative and qualitative manifest content analysis is the intention behind the analysis. The quantitative method seeks to generate a numerical value to either cite prevalence or use in statistical analyses, while the qualitative method seeks to identify a construct or concept within the text using specific words or phrases for substantiation, or to provide a more organized structure to the text being described.

Latent content analysis is most often defined as interpreting what is hidden deep within the text. In this method, the role of the researcher is to discover the implied meaning in participants’ experiences. 8,13 For example, in a transcribed exchange in an office setting, a participant might say to a coworker, “Yeah, here we are…another Monday. So exciting!” The researcher would apply context in order to discover the emotion being conveyed (ie, the implied meaning). In this example, the comment could be interpreted as genuine, it could be interpreted as a sarcastic comment made in an attempt at humor in order to develop or sustain social bonds with the coworker, or the context might imply that the sarcasm was meant to convey displeasure and end the interaction.

Latent content analysis acknowledges that the researcher is intimately involved in the analytical process and that the their role is to actively use mental schema, theories, and lenses to interpret and understand the data. 10 Whereas manifest analyses are typically conducted in a way that the researcher is thought to maintain distance and separation from the objects of study, latent analyses underscore the importance of the researcher co-creating meaning with the text. 17 Adding nuance to this type of content, Potter and Levine‐Donnerstein argue that within latent content analysis, there are two distinct types: latent pattern and latent projective . 14 Latent pattern content analysis seeks to establish a pattern of characteristics in the text itself, while latent projective content analysis leverages the researcher’s own interpretations of the meaning of the text. While both approaches rely on codes that emerge from the content using the coder’s own perspectives and mental schema, the distinction between these two types of analyses are in their foci. 14 Though we do not agree, some researchers believe that all qualitative content analysis is latent content analysis. 11 These disagreements typically occur where there are differences in intent and where there are areas of overlap in the results. For example, both qualitative manifest and latent pattern content analyses may identify patterns as a result of their application. Though in their research design, the researcher would have approached the content with different methodological approaches, with a manifest approach seeking only to describe what is observed, and the latent pattern approach seeking to discover an unseen pattern. At this point, these distinctions may seem too philosophical to serve a practical purpose, so we will attempt to clarify these concepts by presenting three types of analyses for illustrative purposes, beginning with a description of how codes are created and used.

Creating and Using Codes

Codes are the currency of content analysis. Researchers use codes to organize and understand their data. Through the coding process, pharmacy educators can systematically and rigorously categorize and interpret vast amounts of text for use in their educational practice or in publication. Codes themselves are short, descriptive labels that symbolically assign a summative or salient attribute to more than one unit of meaning identified in the text. 18 To create codes, a researcher must first become immersed in the data, which typically occurs when a researcher transcribes recorded data or conducts several readings of the text. This process allows the researcher to become familiar with the scope of the data, which spurs nascent ideas about potential concepts or constructs that may exist within it. If studying a phenomenon that has already been described through an existing framework, codes can be created a priori using theoretical frameworks or concepts identified in the literature. If there is no existing framework to apply, codes can emerge during the analytical process. However, emergent codes can also be created as addenda to a priori codes that were identified before the analysis begins if the a priori codes do not sufficiently capture the researcher’s area of interest.

The process of detecting emergent codes begins with identification of units of meaning. While there is no one way to decide what qualifies as a meaning unit, researchers typically define units of meaning differently depending on what kind of analysis is being conducted. As a general rule, when dialogue is being analyzed, such as interviews or focus groups, meaning units are identified as conversational turns, though a code can be as short as one or two words. In written text, such as student reflections or course evaluation data, the researcher must decide if the text should be divided into phrases or sentences, or remain as paragraphs. This decision is usually made based on how many different units of meaning are expressed in a block of text. For example, in a paragraph, if there are several thoughts or concepts being expressed, it is best to break up the paragraph into sentences. If one sentence contains multiple ideas of interest, making it difficult to separate one important thought or behavior from another, then the sentence can be divided into smaller units, such as phrases or sentence fragments. These phrases or sentence fragments are then coded as separate meaning units. Conversely, longer or more complex units of meaning should be condensed into shorter representations that still retain the original meaning in order to reduce the cognitive burden of the analytical process. This could entail removing verbal ticks (eg, “well, uhm…”) from transcribed data or simplifying a compound sentence. Condensation does not ascribe interpretation or implied meaning to a unit, but only shortens a meaning unit as much as possible while preserving the original meaning identified. 18 After condensation, a researcher can proceed to the creation of codes.

Many researchers begin their analyses with several general codes in mind that help guide their focus as defined by their research question, even in instances where the researcher has no a priori model or theory. For example, if a group of instructors are interested in examining recorded videos of their lectures to identify moments of student engagement, they may begin with using generally agreed upon concepts of engagement as codes, such as students “raising their hands,” “taking notes,” and “speaking in class.” However, as the instructors continue to watch their videos, they may notice other behaviors which were not initially anticipated. Perhaps students were seen creating flow charts based on information presented in class. Alternatively, perhaps instructors wanted to include moments when students posed questions to their peers without being prompted. In this case, the instructors would allow the codes of “creating graphic organizers” and “questioning peers” to emerge as additional ways to identify the behavior of student engagement.

Once a researcher has identified condensed units of meaning and labeled them with codes, the codes are then sorted into categories which can help provide more structure to the data. In the above example of recorded lectures, perhaps the category of “verbal behaviors” could be used to group the codes of “speaking in class” and “questioning peers.” For complex analyses, subcategories can also be used to better organize a large amount of codes, but solely at the discretion of the researcher. Two or more categories of codes are then used to identify or support a broader underlying meaning which develops into themes. Themes are most often employed in latent analyses; however, they are appropriate in manifest analyses as well. Themes describe behaviors, experiences, or emotions that occur throughout several categories. 18 Figure 1 illustrates this process. Using the same videotaped lecture example, the instructors might identify two themes of student engagement, “active engagement” and “passive engagement,” where active engagement is supported by the category of “verbal behavior” and also a category that includes the code of “raising their hands” (perhaps something along the lines of “pursuing engagement”), and the theme of “passive engagement” is supported by a category used to organize the behaviors of “taking notes” and “creating graphic organizers.”

An external file that holds a picture, illustration, etc.
Object name is ajpe7113-fig1.jpg

The Process of Qualitative Content Analysis

To more fully demonstrate the process of content analysis and the generation and use of codes, categories, and themes, we present and describe examples of both manifest and latent content analysis. Given that there are multiple ways to create and use codes, our examples illustrate both processes of creating and using a predetermined set of codes. Regardless of the kind of content analysis instructors want to conduct, the initial steps are the same. The instructor must analyze the data using codes as a sense-making process.

Manifest Content Analysis

The first form of analysis, manifest content analysis, examines text for elements that exist on the surface of the text, the meaning of which is taken at face value. Schools and colleges of pharmacy may benefit from conducting manifest content analyses at a programmatic level, including analysis of student evaluations to determine the value of certain courses, or analysis of recruitment materials for addressing issues of cultural humility in a uniform manner. Such uses for manifest content analysis may help administrators make more data-based decisions about students and courses. However, for our example of manifest content analysis, we illustrate the use of content analysis in informing instruction for a single pharmacy educator ( Figure 2 ).

An external file that holds a picture, illustration, etc.
Object name is ajpe7113-fig2.jpg

A Student’s Completed Beta-blocker Case with Codes in Underlined Bold Text

In the example, a pharmacology instructor is trying to assess students’ understanding of three concepts related to the beta-blocker class of drugs: indication of the drug, relevance of family history, and contraindications and precautions. To do so, the instructor asks the students to write a patient case in which beta-blockers are indicated. The instructor gives the students the following prompt: “Reverse-engineer a case in which beta-blockers would be prescribed to the patient. Include a history of the present illness, the patients’ medical, family, and social history, medications, allergies, and relevant lab tests.” Figure 2 is a hypothetical student’s completed assignment, in which they demonstrate their understanding of when and why a beta-blocker would be prescribed.

The student-generated cases are then treated as data and analyzed for the presence of the three previously identified indicators of understanding in order to help the instructor make decisions about where and how to focus future teaching efforts related to this drug class. Codes are created a priori out of the instructor’s interest in analyzing students’ understanding of the concepts related to beta-blocker prescriptions. A codebook ( Table 2 ) is created with the following columns: name of code, code description, and examples of the code. This codebook helps an individual researcher to approach their analysis systematically, but it can also facilitate coding by multiple coders who would apply the same rules outlined in the codebook to the coding process.

Example Code Book Created for Manifest Content Analysis

An external file that holds a picture, illustration, etc.
Object name is ajpe7113-t2.jpg

Using multiple coders introduces complexity to the analysis process, but it is oftentimes the only practical way to analyze large amounts of data. To ensure that all coders are working in tandem, they must establish inter-rater reliability as part of their training process. This process requires that a single form of text be selected, such as one student evaluation. After reviewing the codebook and receiving instruction, everyone on the team individually codes the same piece of data. While calculating percentage agreement has sometimes been used to establish inter-rater reliability, most publication editors require more rigorous statistical analysis (eg, Krippendorf’s alpha, or Cohen’s kappa). 19 Detailed descriptions of these statistics fall outside the scope of this introduction, but it is important to note that the choice depends on the number of coders, the sample size, and the type of data to be analyzed.

Latent Content Analysis

Latent content analysis is another option for pharmacy educators, especially when there are theoretical frameworks or lenses the educator proposes to apply. Such frameworks describe and provide structure to complex concepts and may often be derived from relevant theories. Latent content analysis requires that the researcher is intimately involved in interpreting and finding meaning in the text because meaning is not readily apparent on the surface. 10 To illustrate a latent content analysis using a combination of a priori and emergent codes, we will use the example of a transcribed video excerpt from a student pharmacist interaction with a standardized patient. In this example, the goal is for first-year students to practice talking to a customer about an over-the-counter medication. The case is designed to simulate a customer at a pharmacy counter, who is seeking advice on a medication. The learning objectives for the pharmacist in-training are to assess the customer’s symptoms, determine if the customer can self-treat or if they need to seek out their primary care physician, and then prescribe a medication to alleviate the patient’s symptoms.

To begin, pharmacy educators conducting educational research should first identify what they are looking for in the video transcript. In this case, because the primary outcome for this exercise is aimed at assessing the “soft skills” of student pharmacists, codes are created using the counseling rubric created by Horton and colleagues. 20 Four a priori codes are developed using the literature: empathy, patient-friendly terms, politeness, and positive attitude. However, because the original four codes are inadequate to capture all areas representing the skills the instructor is looking for during the process of analysis, four additional codes are also created: active listening, confidence, follow-up, and patient at ease. Figure 3 presents the video transcript with each of the codes assigned to the meaning units in bolded parentheses.

An external file that holds a picture, illustration, etc.
Object name is ajpe7113-fig3.jpg

A Transcript of a Student’s (JR) Experience with a Standardized Patient (SP) in Which the Codes are Bolded in Parentheses

Following the initial coding using these eight codes, the codes are consolidated to create categories, which are depicted in the taxonomy in Figure 4 . Categories are relationships between codes that represent a higher level of abstraction in the data. 18 To reach conclusions and interpret the fundamental underlying meaning in the data, categories are then organized into themes ( Figure 1 ). Once the data are analyzed, the instructor can assign value to the student’s performance. In this case, the coding process determines that the exercise demonstrated both positive and negative elements of communication and professionalism. Under the category of professionalism, the student generally demonstrated politeness and a positive attitude toward the standardized patient, indicating to the reviewer that the theme of perceived professionalism was apparent during the encounter. However, there were several instances in which confidence and appropriate follow-up were absent. Thus, from a reviewer perspective, the student's performance could be perceived as indicating an opportunity to grow and improve as a future professional. Typically, there are multiple codes in a category and multiple categories in a theme. However, as seen in the example taxonomy, this is not always the case.

An external file that holds a picture, illustration, etc.
Object name is ajpe7113-fig4.jpg

Example of a Latent Content Analysis Taxonomy

If the educator is interested in conducting a latent projective analysis, after identifying the construct of “soft skills,” the researcher allows for each coder to apply their own mental schema as they look for positive and negative indicators of the non-technical skills they believe a student should develop. Mental schema are the cognitive structures that provide organization to knowledge, which in this case allows coders to categorize the data in ways that fit their existing understanding of the construct. The coders will use their own judgement to identify the codes they feel are relevant. The researcher could also choose to apply a theoretical lens to more effectively conceptualize the construct of “soft skills,” such as Rogers' humanism theory, and more specifically, concepts underlying his client-centered therapy. 21 The role of theory in both latent pattern and latent projective analyses is at the discretion of the researcher, and often is determined by what already exists in the literature related to the research question. Though, typically, in latent pattern analyses theory is used for deductive coding, and in latent projective analyses underdeveloped theory is used to first deduce codes and then for induction of the results to strengthen the theory applied. For our example, Rogers describes three salient qualities to develop and maintain a positive client-professional relationship: unconditional positive regard, genuineness, and empathetic understanding. 21 For the third element, specifically, the educator could look for units of meaning that imply empathy and active listening. For our video transcript analysis, this is evident when the student pharmacist demonstrated empathy by responding, "Yeah, I understand," when discussing aggravating factors for the patient's condition. The outcome for both latent pattern and latent projective content analysis is to discover the underlying meaning in a text, such as social rules or mental models. In this example, both pattern and projective approaches can discover interpreted aspects of a student’s abilities and mental models for constructs such as professionalism and empathy. The difference in the approaches is where the precedence lies: in the belief that a pattern is recognizable in the content, or in the mental schema and lived experiences of the coder(s). To better illustrate the differences in the processes of latent pattern and projective content analyses, Figure 5 presents a general outline of each method beginning with the creation of codes and concluding with the generation of themes.

An external file that holds a picture, illustration, etc.
Object name is ajpe7113-fig5.jpg

Flow Chart of the Stages of Latent Pattern and Latent Projective Content Analysis

How to Choose a Methodological Approach to Content Analysis

To determine which approach a researcher should take in their content analysis, two decisions need to be made. First, researchers must determine their goal for the analysis. Second, the researcher must decide where they believe meaning is located. 14 If meaning is located in the discrete elements of the content that are easily identified on the surface of the text, then manifest content analysis is appropriate. If meaning is located deep within the content and the researcher plans to discover context cues and make judgements about implied meaning, then latent content analysis should be applied. When designing the latent content analysis, a researcher then must also identify their focus. If the analysis is intended to identify a recognizable truth within the content by uncovering connections and characteristics that all coders should be able to discover, then latent pattern content analysis is appropriate. If, on the other hand, the researcher will rely heavily on the judgment of the coders and believes that interpretation of the content must leverage the mental schema of the coders to locate deeper meaning, then latent projective content analysis is the best choice.

To demonstrate how a researcher might choose a methodological approach, we have presented a third example of data in Figure 6 . In our two previous examples of content analysis, we used student data. However, faculty data can also be analyzed as part of educational research or for faculty members to improve their own teaching practices. Recall in the video data analyzed using latent content analysis, the student was tasked to identify a suitable over-the-counter medication for a patient complaining of heartburn symptoms. We have extended this example by including an interview with the pharmacy educator supervising the student who was videotaped. The goal of the interview is to evaluate the educator’s ability to assess the student’s performance with the standardized patient. Figure 6 is an excerpt of the interview between the course instructor and an instructional coach. In this conversation, the instructional coach is eliciting evidence to support the faculty member’s views, judgements, and rationale for the educator’s evaluation of the student’s performance.

An external file that holds a picture, illustration, etc.
Object name is ajpe7113-fig6.jpg

A Transcript of an Interview in Which the Interviewer (IN) Questions a Faculty Member (FM) Regarding Their Student’s Standardized Patient Experience

Manifest content analysis would be a valid choice for this data if the researcher was looking to identify evidence of the construct of “instructor priorities” and defined discrete codes that described aspects of performance such as “communication,” “referrals,” or “accurate information.” These codes could be easily identified on the surface of the transcribed interview by identifying keywords related to each code, such as “communicate,” “talk,” and “laugh,” for the code of “communication.” This would allow coders to identify evidence of the concept of “instructor priorities” by sorting through a potentially large amount of text with predetermined targets in mind.

To conduct a latent pattern analysis of this interview, researchers would first immerse themselves in the data to identify a theoretical framework or concepts that represent the area of interest so that coders could discover an emerging truth underneath the surface of the data. After immersion in the data, a researcher might believe it would be interesting to more closely examine the strategies the coach uses to establish rapport with the instructor as a way to better understand models of professional development. These strategies could not be easily identified in the transcripts if read literally, but by looking for connections within the text, codes related to instructional coaching tactics emerge. A latent pattern analysis would require that the researcher code the data in a way that looks for patterns, such as a code of “facilitating reflection,” that could be identified in open-ended questions and other units of meaning where the coder saw evidence of probing techniques, or a code of “establishing rapport” for which a coder could identify nonverbal cues such as “[IN leans forward in chair].”

Conducting latent projective content analysis might be useful if the researcher was interested in using a broader theoretical lens, such as Mezirow’s theory of transformative learning. 22 In this example, the faculty member is understood to have attempted to change a learner’s frame of reference by facilitating cognitive dissonance or a disorienting experience through a standardized patient simulation. To conduct a latent projective analysis, the researcher could analyze the faculty member’s interview using concepts found in this theory. This kind of analysis will help the researcher assess the level of change that the faculty member was able to perceive, or expected to witness, in their attempt to help their pharmacy students improve their interactions with patients. The units of meaning and subsequent codes would rely on the coders to apply their own knowledge of transformative learning because of the absence in the theory of concrete, context-specific behaviors to identify. For this analysis, the researcher would rely on their interpretations of what challenging educational situations look like, what constitutes cognitive dissonance, or what the faculty member is really expecting from his students’ performance. The subsequent analysis could provide evidence to support the use of such standardized patient encounters within the curriculum as a transformative learning experience and would also allow the educator to self-reflect on his ability to assess simulated activities.

OTHER ASPECTS TO CONSIDER

Navigating terminology.

Among the methodological approaches, there are other terms for content analysis that researchers may come across. Hsieh and Shannon 10 proposed three qualitative approaches to content analysis: conventional, directed, and summative. These categories were intended to explain the role of theory in the analysis process. In conventional content analysis, the researcher does not use preconceived categories because existing theory or literature are limited. In directed content analysis, the researcher attempts to further describe a phenomenon already addressed by theory, applying a deductive approach and using identified concepts or codes from exiting research to validate the theory. In summative content analysis, a descriptive approach is taken, identifying and quantifying words or content in order to describe their context. These three categories roughly map to the terms of latent projective, latent pattern, and manifest content analyses respectively, though not precisely enough to suggest that they are synonyms.

Graneheim and colleagues 9 reference the inductive, deductive, and abductive methods of interpretation of content analysis, which are data-driven, concept-driven, and fluid between both data and concepts, respectively. Where manifest content produces phenomenological descriptions most often (but not always) through deductive interpretation, and latent content analysis produces interpretations most often (but not always) through inductive or abductive interpretations. Erlingsson and Brysiewicz 23 refer to content analysis as a continuum, progressing as the researcher develops codes, then categories, and then themes. We present these alternative conceptualizations of content analysis to illustrate that the literature on content analysis, while incredibly useful, presents a multitude of interpretations of the method itself. However, these complexities should not dissuade readers from using content analysis. Identifying what you want to know (ie, your research question) will effectively direct you toward your methodological approach. That said, we have found the most helpful aid in learning content analysis is the application of the methods we have presented.

Ensuring Quality

The standards used to evaluate quantitative research are seldom used in qualitative research. The terms “reliability” and “validity” are typically not used because they reflect the positivist quantitative paradigm. In qualitative research, the preferred term is “trustworthiness,” which is comprised of the concepts of credibility, transferability, dependability, and confirmability, and researchers can take steps in their work to demonstrate that they are trustworthy. 24 Though establishing trustworthiness is outside the scope of this article, novice researchers should be familiar with the necessary steps before publishing their work. This suggestion includes exploration of the concept of saturation, the idea that researchers must demonstrate they have collected and analyzed enough data to warrant their conclusions, which has been a focus of recent debate in qualitative research. 25

There are several threats to the trustworthiness of content analysis in particular. 14 We will use the terms “reliability and validity” to describe these threats, as they are conceptualized this way in the formative literature, and it may be easier for researchers with a quantitative research background to recognize them. Though some of these threats may be particular to the type of data being analyzed, in general, there are risks specific to the different methods of content analysis. In manifest content analysis, reliability is necessary but not sufficient to establish validity. 14 Because there is little judgment required of the coders, lack of high inter-rater agreement among coders will render the data invalid. 14 Additionally, coder fatigue is a common threat to manifest content analysis because the coding is clerical and repetitive in nature.

For latent pattern content analysis, validity and reliability are inversely related. 14 Greater reliability is achieved through more detailed coding rules to improve consistency, but these rules may diminish the accessibility of the coding to consumers of the research. This is defined as low ecological validity. Higher ecological validity is achieved through greater reliance on coder judgment to increase the resonance of the results with the audience, yet this often decreases the inter-rater reliability. In latent projective content analysis, reliability and validity are equivalent. 14 Consistent interpretations among coders both establishes and validates the constructed norm; construction of an accurate norm is evidence of consistency. However, because of this equivalence, issues with low validity or low reliability cannot be isolated. A lack of consistency may result from coding rules, lack of a shared schema, or issues with a defined variable. Reasons for low validity cannot be isolated, but will always result in low consistency.

Any good analysis starts with a codebook and coder training. It is important for all coders to share the mental model of the skill, construct, or phenomenon being coded in the data. However, when conducting latent pattern or projective content analysis in particular, micro-level rules and definitions of codes increase the threat of ecological validity, so it is important to leave enough room in the codebook and during the training to allow for a shared mental schema to emerge in the larger group rather than being strictly directed by the lead researcher. Stability is another threat, which occurs when coders make different judgments as time passes. To reduce this risk, allowing for recoding at a later date can increase the consistency and stability of the codes. Reproducibility is not typically a goal of qualitative research, 15 but for content analysis, codes that are defined both prior to and during analysis should retain their meaning. Researchers can increase the reproducibility of their codebook by creating a detailed audit trail, including descriptions of the methods used to create and define the codes, materials used for the training of the coders, and steps taken to ensure inter-rater reliability.

In all forms of qualitative analysis, coder fatigue is a common threat to trustworthiness, even when the instructor is coding individually. Over time, the cases may start to look the same, making it difficult to refocus and look at each case with fresh eyes. To guard against this, coders should maintain a reflective journal and write analytical memos to help stay focused. Memos might include insights that the researcher has, such as patterns of misunderstanding, areas to focus on when considering re-teaching specific concepts, or specific conversations to have with students. Fatigue can also be mitigated by occasionally talking to participants (eg, meeting with students and listening for their rationale on why they included specific pieces of information in an assignment). These are just examples of potential exercises that can help coders mitigate cognitive fatigue. Most researchers develop their own ways to prevent the fatigue that can seep in after long hours of looking at data. But above all, a sufficient amount of time should be allowed for analysis, so that coders do not feel rushed, and regular breaks should be scheduled and enforced.

Qualitative content analysis is both accessible and high-yield for pharmacy educators and researchers. Though some of the methods may seem abstract or fluid, the nature of qualitative content analysis encompasses these concerns by providing a systematic approach to discover meaning in textual data, both on the surface and implied beneath it. As with most research methods, the surest path towards proficiency is through application and intentional, repeated practice. We encourage pharmacy educators to ask questions suited for qualitative research and to consider the use of content analysis as a qualitative research method for discovering meaning in their data.

Logo for VCU's Press Books

15. Materials-Based Methods

15.1. Content Analysis

Learning Objectives

  • Define content analysis.
  • Describe qualitative and quantitative strategies employed in content analysis.
  • Understand how to present the results from content analysis.

As we noted earlier, content analysis is a materials-based research method that focuses on texts and their meanings. Sociologists use a more expansive definition of “text” than the word typically has. In a research context, the content being analyzed is essentially any recorded communication. This would obviously include actual written copy, such as news articles or email messages, but we can consider content that we might see or hear—such as a speech, dance performance, television show, advertisement, or movie—to be “texts” as well. Table 15.1 provides some examples of the kinds of data that sociologists have studied using content analysis techniques.

Table 15.1. Examples of Content Analysis

One thing you might notice is that the data sources described in this table are primary sources . As you may remember from Chapter 5: Research Design , primary sources are original works representing first-hand experiences, often written by individuals who were present at a noteworthy event or had relevant experiences. Primary sources that could be studied through content analysis include personal journals, emails, letters, government documents, speeches, television commercials, social media posts, and news articles published at the time of an event of interest.

While content analysis usually focuses on primary sources, there are also examples of studies that use secondary sources (which draw upon primary sources for information, such as academic publications, biographies, and news articles that review other media or research) and tertiary sources (which summarize the results of secondary sources). With these sources, researchers might be interested in how the person or persons who generated the secondary or tertiary source reached their conclusions about the topic in question, or how their decisions about presenting information might shape people’s understandings. For example, Myra Marx Ferree and Elaine Hall (1990) conducted a content analysis of introductory sociology textbooks to learn how students were being taught sociology. As part of their study, the researchers examined the images being presented to students and what messages they conveyed. They concluded that “women were not represented in numbers proportionate to their distribution in the population” and people of color “would have been numerically underrepresented” in images had the textbooks not included chapters specifically on race (Ferree and Hall 1990:529, 528).

Sometimes students new to research methods struggle to grasp the difference between a content analysis of primary scholarly literature and a review of literature on a sociological topic. As we discussed in Chapter 5: Research Design , a literature review examines sources to understand what we know and don’t know about a particular topic. These sources are typically peer-reviewed, written by trained scholars, and published by an academic journal or press. They are primarily papers that elaborate on scholarly theories or present empirical research conducted using accepted techniques of data collection and analysis for the discipline. The researcher synthesizes these sources to arrive at some conclusion about social scientists’ overall knowledge about a topic—often as a prelude to designing their own study on that topic.

A content analysis of research studies would use its sources to pursue very different questions. In short, it would be a “study of the studies” as opposed to a “review of studies.” For example, a content analysis of scholarly literature might ask whether the top-ranking journals in a particular discipline disproportionately publish work by men. The researcher might gather a sample of articles published in different journals and count their authors by gender (though this may be a tricky prospect if relying only on names to indicate gender). Another content analysis of academic studies might examine whether and how the topics that scholars study go in and out of style. A researcher tackling this research question could look at the articles published in various journals, code the topics they cover, and see if there is any change in the most popular topics over the years. Unlike with literature reviews, the researchers in these examples are not examining the content of their sources to identify a gap in the literature and design a study around it. Instead, they are looking to learn about what the publication of these articles says about the scientific community that published them.

Qualitative versus Quantitative Approaches to Content Analysis

Collage of photographs of six leading ladies (i.e., “Bond Girls”) from James Bond films.

Content analysis can be quantitative or qualitative, and often researchers will use both strategies to strengthen their investigations. Quantitative content analysis focuses on variables whose characteristics can be counted. For example, Kimberly Neuendorf and her colleagues (2010) reviewed 20 films featuring the fictional British spy James Bond. They examined the portrayals of women across the films—195 female characters in total—and tracked particular details about each character. For example, a variable they called “role prominence” variable assessed whether the female’s part was minor, medium, or major. Other variables counted the number of times weapons were used by and against a female character. The approach in such a quantitative content analysis uses the same techniques we covered in Chapter 14: Quantitative Data Analysis . Neuendorf and her coauthors, for instance, calculated frequencies for their three categories of “role prominence,” finding that 52 percent of female roles were minor, 30 percent medium, and 17 percent major. They generated both univariate and multivariate statistics—for instance, determining that 25 percent of female characters had a weapon used against them, and their own use of weapons was related to their levels of sexual activity within the films.

In qualitative content analysis , the aim is to identify themes in the text and examine the underlying meanings of those themes. Tony Chambers and Ching-hsiao Chiang (2012) used such an approach in their content analysis of open-ended comments in the National Survey of Student Engagement. First, the researchers identified passages in the comments where students appeared to be raising issues standing in the way of their academic success. Through the coding process (see Chapter 11: Qualitative Data Analysis ), they settled upon certain key themes across these passages, which could be broadly grouped into categories of academic needs, the campus environment, financial issues, and student services. The approach here was inductive, in that the researchers did not determine these themes beforehand, but rather allowed them to emerge during the coding process. Ultimately, this analysis allowed Chambers and Chiang to highlight certain disconnects between the goals of colleges and universities and students’ actual experiences.

Note that both quantitative and qualitative approaches can easily be used in the same content analysis. For example, Vaughn Crichlow and Christopher Fulcher (2017) analyzed coverage in the New York Times and USA Today about the deaths of three African American men while they were in police custody: Eric Garner, Michael Brown, and Freddie Gray. The content analysis focused on the quotes of experts (public officials and pundits) that appeared in these news articles. First, the researchers examined the quotes to generate themes, and then they tallied the numbers of quotes that fit the noteworthy themes they identified (as shown in this example, content analysis can move rather seamlessly from inductive qualitative analysis to deductive quantitative analysis). As shown in Figure 15.1 , the analysis found that the experts quoted across these articles rarely discussed strategies to reduce police shootings and improve police-community relations in communities of color.

Table indicating that 75 percent of experts’ quotes did not discuss reducing police shootings or improving police-community relations after Eric Garner’s death in Staten Island, 77 percent did not do so after Michael Brown’s death in Ferguson, and 75 percent did not do so after Freddie Gray’s death in Baltimore.

One important thing to note about content analysis is the critical importance of developing a comprehensive and detailed codebook. For the study of James Bond films, for instance, the eight coders assigned values to the study’s variables drawing upon a codebook the authors had created well before any analysis began. Especially when many people are involved with coding, there needs to be a shared understanding of what exactly the codes mean to avoid divergent interpretations of the same data and maintain inter-coder reliability (a topic we will return to).

Defining the Scope of a Content Analysis

Photograph of a mural in Austin, Texas, that depicts Colin Kaepernick kneeling in front of images of George Floyd, Ahmaud Arbery, Eric Garner, Tamir Rice, Trayvon Martin, Breonna Taylor, and Mike Ramos.

For many studies that rely on content analysis, setting boundaries on what content to study is challenging because there is just so much content available to researchers. Consider a study by Jules Boykoff and Ben Carrington (2020) that analyzed media coverage of American football player Colin Kaepernick’s protests in 2016. By kneeling during the playing of the national anthem at numerous NFL games, Kaepernick sought to make a public statement about ongoing police brutality against African Americans. For their study of reactions to these protests, researchers examined “the media framing contests that emerged between Colin Kaepernick and his supporters on one side and his detractors on the other, including President Donald Trump” (Boykoff and Carrington 2020:832).

If the idea is to study how the media presents a particular issue relating to racial inequality, however, we need to be very specific about what we mean by “media.” The term covers a lot of territory, even if we restrict it to just “mass media”—news articles, radio and television broadcasts, and the like. Conceivably, we could analyze reports about Kaepernick in hundreds of sources across a wide range of media platforms. Clearly, however, that is not a realistic option for any study, even a well-funded one.

To make our content analysis feasible, we need to precisely define the scope conditions of our study. In Chapter 3: The Role of Theory in Research , we talked about how all theories have scope conditions, which tell us where the theory can and cannot be applied. Among other things, a theory is constrained ( delimited ) based on the contexts where it has been studied empirically. For instance, a theory developed from data in the United States may not necessarily apply in other societies. When we are designing an empirical study, we need to think about the reverse consideration—what can we feasibly study? How should we set the boundaries of our data collection? These are the scope conditions of our study.

For their study, Boykoff and Carrington decided to study print and online stories from newspapers—thereby reducing the vast category of “media” to the smaller (and dwindling) subcategory of “newspapers.” But which newspapers? In 2018, when Boykoff and Carrington were working on their analysis, there were 1,279 daily newspapers operating across the country. Though this was fewer than in previous years, examining multiple articles in all of these publications would clearly have been an overwhelming task. Instead, the researchers decided to choose the four national newspapers with the highest circulation numbers— USA Today , the Wall Street Journal , the New York Times , and the Los Angeles Times— plus the seventh-ranked paper, the Washington Post . Their rationale for the first four had to do with the national reach and influence of these papers. They choose to include the lower-circulation Post as well because of its in-depth coverage of issues and its numerous online articles. Note that for any content analysis, you should be specific in a similar way about your rationale for including or excluding certain sources. Always provide a clear and compelling justification—detailed at length in your paper’s methods section—for these methodological choices, rather than just saying these were the most convenient sources to analyze. Convenience can be an important consideration, but you should be able to speak thoughtfully about the benefits and drawbacks of the boundaries you set.

Even after Boykoff and Carrington settled on five newspaper sources to analyze, they still had another set of methodological decisions to make: what particular time period should they set for the Kapernick-related articles they would analyze? Clearly there would be no coverage before the start of Kaepernick’s protests on August 14, 2016, so they could easily set a starting point for their analysis. The ideal ending point was less self-evident, however, since newspapers obviously don’t coordinate with each other about when to stop coverage of a story. For their study, the researchers chose to examine news articles within a two-year period ending on August 14, 2018. By that time, they reasoned, the controversy had diminished, and no other NFL team had signed Kaepernick. This brings up another important point about setting scope conditions for a study: sometimes the decisions can be more or less arbitrary. While a two-year period is reasonable and justifiable, you could imagine how other researchers might make a decent case for a longer or shorter period. As a researcher, you will need to trust your own judgment and be able to defend your decisions, even if they are somewhat a matter of personal preference.

Now that Boykoff and Carrington had settled on a time period for their study, were they finally done? No. They still had to decide whether they should study all articles published during this period or be more selective. They decided to apply certain inclusion criteria to make it more likely that the articles they analyzed would contain material relevant to their research question of how the media framed Kapernick’s actions. In their first sweep, they searched the archives of each newspaper for the terms “Kaepernick” and “protest.” Then they narrowed that list to just those articles that were at least five paragraphs long and mentioned Kaepernick somewhere in the first five paragraphs of text, reasoning that these articles would be more likely to say something substantive about the controversy. By defining the scope of their content analysis in all these ways, Boykoff and Carrington ensured that closely reviewing their materials was something they could do with a reasonable amount of time and effort. At the same time, they maximized the likelihood that their analysis would capture the vast majority of newspaper articles that put forward a useful (for their research purposes) perspective on the protests.

Let’s look at another example. We previously mentioned a published content analysis of gender and racial representations within the images published in introductory sociology textbooks (Ferree and Hall 1990). To identify sources for their analysis, the authors consulted the 1988 volume of Books in Print, a comprehensive database of books published in a particular year. They searched the database with the keywords “sociology” and “sociological” and obtained the most recent editions of textbooks whose titles contained those words.

Through this process of culling, the authors dramatically reduced the number of textbooks they had to analyze. However, they had another decision to make: should they analyze all the content within each textbook? For their research purposes, they decided to ignore the written content in the textbooks and instead focus on the accompanying images. Their reasoning here—which, as with any methodological decision, could be debated—was that the photographs and illustrations vividly captured “the currently acceptable conceptualization of race and gender, as constructed in introductory sociology textbooks by authors and publishers” (501). Because of this specification, the researchers were able to eliminate textbooks that did not contain any photographs or illustrations. Their final sample included 33 introductory sociology textbooks with a total of 5,413 illustrations, all of which were subjected to analysis.

Even after you have defined the overall scope of your content analysis, you can be even more selective about the texts you examine by applying a sampling process. Say that your research question is, “How often do presidents of the United States mention domestic issues as opposed to international issues in their State of the Union addresses?” Luckily, a comprehensive list of State of the Union addresses that includes their full text is easy to locate on the internet. With all these sources readily available to you, you might decide you want to analyze every single speech. In this case, you would not need to select a sample. An examination of all of the possible elements in a population is called a census . The content analysis of James Bond films we mentioned earlier fits this category (Neuendorf et al. 2010). The authors conducted a census of every James Bond film available to them—from Dr. No (1962) starring Sean Connery, to Die Another Day (2002) starring Pierce Brosnan. (Their analysis went up to 2005, so they did not cover any of the Daniel Craig films.) Given the relatively small number of films they needed to analyze—just 20—the researchers were able to include all the elements in their target population.

Indeed, what resources you have available to you will often dictate whether you can study all possible sources or need to draw a smaller sample. Even though it is easy to identify and locate all 200-some State of the Union addresses, for instance, you may not have the time or energy to analyze every single one. For the hypothetical study mentioned earlier, let’s say you look over some of the State of the Union addresses, and given how long and dense many of them are, you decide you can only analyze 20 of them. So how do you decide which 20 to look at?

You can select a sample by using either probability or nonprobability sampling methods, as described in Chapter 6: Sampling . Let’s walk through the process of drawing a simple random sample of 20 State of the Union addresses:

  • As of this writing, there have been 233 State of the Union addresses. So you should start by creating a list of all 233 addresses and then numbering them from 1 to 233.
  • Generate 20 random numbers in the range of 1 to 233 using a website such as random.org .
  • Go down your list of random numbers, choosing State of the Union addresses whose numbering in your original list of addresses corresponds to each random number. Ignore any duplicate numbers (you’ll need to generate a new random number to replace it) and keep going until you have selected 20 addresses.

A key advantage of using probability sampling techniques, as you might remember, is that they allow you to generate statistics that estimate the population values—giving you a sense of how certain characteristics are distributed and whether certain relationships exist in the actual population. So if you selected your 20 State of the Union addresses through a simple random sampling procedure (or some other probability sampling technique), you would be able to generalize the results from your sample of 20 addresses to the population of all 200-some addresses. If you found that text relating to international issues was just a small percentage of the total text across your 20 State of the Union addresses, for instance, you could with some confidence say that the same percentage applied to discussion of international issues across all State of the Union addresses.

Content analyses can also rely on samples that were not drawn using probability sampling techniques. As we described in Chapter 6: Sampling , the best choice among nonprobability sampling techniques is usually purposive sampling , which uses theory to guide the researcher in deciding upon what elements to include. For the example research question we’ve been considering that examines the extent to which presidents discuss international and domestic issues during their State of the Union addresses, a purposive sample might include the most recent 10 addresses delivered in wartime and the most recent 10 delivered in peacetime. This sampling choice would allow you to understand how the prevailing context of war or peace influences how presidents talk about international issues in their public statements. Note, however, that your nonprobability sample here would not allow you to make statistical estimates of population parameters for all 233 addresses.

Even if you choose to study all the units in your population of interest, you may still have a use for sampling techniques. Researchers often select smaller samples from their populations in order to develop coding schemes. For example, Dina Borzekowski and her colleagues (2010) did a comprehensive internet search to identify 180 active websites that endorsed and supported eating disorders. Before they started going through all 180 websites, however, the researchers reviewed the existing literature and used it to develop a preliminary coding scheme that described a number of relevant variables that they might measure within the data. They then tested that proposed scheme on a random sample of 25 of the 180 websites. The data from this pilot test helped the researchers develop a codebook with detailed coding guidelines, which the team of six researchers then used to code the entire sample in a consistent fashion.

A similar approach of testing coding schemes with samples can be used for qualitative content analysis. After compiling their sample of newspaper articles, Boykoff and Carrington (2020:833) selected a random 30-article subsample to help them identify the frames that their media sources applied to coverage of Colin Kaepernick’s kneeling protests. One of the study’s authors and a research assistant carefully read all articles in that subsample and inductively generated a list of relevant frames. After consulting with one another, the researchers boiled down their list to seven predominant frames and three competing frames, which they then used to code the entire sample of 301 articles.

Coding in Quantitative Content Analysis

When researchers code a text for a content analysis, they are looking for two types of content: manifest and latent content. Manifest content has a more or less obvious significance to it. It presents a surface-level meaning that can be readily discerned. Latent content refers to the underlying meaning of the surface content we observe. Coding manifest content typically involves a highly structured process—think of the very precise ways, for example, that the study of James Bond films we mentioned earlier tallied the cases when women were targets of or wielders of weapons (Neuendorf et al. 2010). Qualitative researchers, however, are usually not interested in manifest content in itself, but as a stepping stone to deeper understanding, which they pursue with the coding of latent content.

Say that we are studying advertisements for kitchen utensils that have been published in magazines over the last several decades. One type of manifest content to code across these ads might be the stated purpose of each featured utensil. For instance, the text in the ads might emphasize how the utensil—say, a potato masher—makes preparing mashed potatoes a cinch for Thanksgiving dinners. Or it might focus instead on the general versatility and efficiency of that potato masher in dealing with russet, red, and Yukon Gold potatoes, or even the dreaded yam. We could also make note of how users of the utensils are depicted within the ads, using codes like “entertaining at home” when we see someone in the ad using the utensil to cook for a large gathering (refer to Chapter 11: Qualitative Data Analysis for a review of how to write up codes). In turn, one set of latent codes that could emerge from these manifest codes would be our assessment of the lifestyles that the ad—by playing up certain features of the utensil—is promoting. For instance, we might see a shift over the years in the stated purpose of the utensils and the depictions of its users across the ads we are analyzing: from an emphasis on utensils designed to facilitate in-home entertaining to those designed to maximize efficiency and minimize time spent in the kitchen. We might theorize that this shift reflects a corresponding shift in how (and how much) people spend time in their homes. (See Video 15.1 for one take on this woefully understudied topic of kitchen utensils.)

Video 15. 1. What Kitchen Utensils Say about Society. In the first part of this segment from the PBS documentary People Like Us , satirist Joe Queenan riffs on how kitchen utensils can serve as markers of social class—in essence, an argument that one might explore through the latent content of kitchen utensil advertisements, as we describe.

To record the observations we make during content analysis, we typically rely upon a code sheet , sometimes referred to as a tally sheet . Code sheets allow us to apply a systematic approach to our analysis of the data. For instance, let’s say we want to conduct a content analysis of kitchen utensils—this time, not the advertisements about them, but the utensils themselves (remember that any “text”—even physical objects—can be studied in a content analysis, so long as the object conveys meanings we can analyze.) We happen to have access to sales records for kitchen utensils over the past 50 years. Based on these records, we generate a list of 50 utensils, the top-selling utensils in each year. For each utensil, we use our code sheet (as shown in Table 15.2 ) to record its name, culinary purpose, and price in current dollar amounts (note that adjusting for inflation is crucial whenever you compare monetary amounts across years). We might also want to make some qualitative assessments about each utensil and its purpose—say, how easy or hard it is to use. To rate this difficulty of use, we use a 5-point scale, with 1 being very easy and 5 being very hard. The specific criteria we use to determine this difficulty of use should be described in our codebook, along with any other instructions for coding the study’s variables. (For space reasons, the sample sheet contains columns only for 10 years’ worth of utensils; if we were to conduct this project—and who wouldn’t want to learn more about the history of kitchen utensils?—we’d need columns for each of the 50 items in our sample.)

Table 15.2. Sample Code Sheet for Study of Kitchen Utensil Popularity Over Time

As our example shows, a code sheet can contain both qualitative and quantitative data. For instance, our ease of use row will report our difficulty rating for each utensil, a quantitative assessment. We will be able to analyze the data recorded in this row using statistical procedures of the sort outlined in Chapter 14: Quantitative Data Analysis —say, by calculating the mean value of “ease of use” for each of the five decades we are observing. We will be able to do the same thing with the data collected in the “price” row, which is also a quantitative measure. The final row of our example code sheet will contain qualitative data: notes about our impressions of the utensils we are examining. For the data in this row, conducting open and focused coding (as described in Chapter 11: Qualitative Data Analysis ) is an option. But regardless of whether we are analyzing qualitative or quantitative data, our goal will be the same: identifying patterns across our data.

Let’s delve more deeply into what the coding of manifest content looks like, drawing from the example of an actual published paper. In her study of scouting manuals published by the Boy Scouts and Girl Scouts, Kathleen Denny (2011) sought to understand how these two organizations communicated different expectations about genders. As we noted, the measurement process should be carefully controlled for a quantitative content analysis, and Denny was very specific in how she conceptualized and operationalized a key question in her study: did the Boy Scout and Girl Scout manuals differ in the types of activities that scouts participated in? (Refer to Chapter 7: Measuring the Social World for a review of the conceptualization and operationalization stages.)

First, Denny defined an “activity” as a set of instructions that the scout must accomplish in order to be awarded a particular badge, and “participate” as attempts to fulfill the required instructions. In terms of how the manuals differed, Denny focused on a distinction she made between “others-oriented” and “self-oriented” activities, which she conceptualized as follows:

I refer to activities explicitly instructed to take place with or for others as “others-oriented activities” and activities either explicitly instructed to take place individually or not explicitly instructed to take place in groups as “self-oriented activities.” An example of a self-oriented activity is, “Draw a floor plan of your home,” for the boys’ Engineer badge …. An example of an others-oriented activity for the girls is, “In a troop, group, or with other girls, brainstorm a list of ways you can help the environment,” for the Your Outdoor Surroundings badge … (Denny 2011:34)

Next, Denny decided on operational definitions for her concepts of “self-directed” and “others-directed” activities. Later in her article, she goes into detail about how these variables were measured—again, crucial context for any content analysis:

Activities coded as self-oriented are not necessarily required by the organization to be executed alone. Activities were coded as self-oriented if the activity description specifically indicated that it should be accomplished individually or implied individual action (e.g., reading) or if the activity instructions did not specifically call for the presence of others. For instance, the activity “Visit the newsroom of a newspaper” … was coded as a self-oriented activity because the instructions do not include others, whereas the activity “In a troop, group, or with other girls, brainstorm a list of ways you can help the environment” … was coded as others-oriented because the instructions expressly call for others to be present. (Denny 2011:44)

Denny reviewed 1,763 badge activities across the Boy Scout and Girl Scout handbooks. By using a structured coding process to analyze the handbooks’ manifest content and count up the activity types they signaled, she was able to conclude that boys were offered more self-oriented activities, while girls were offered more others-oriented activities. (Since the time of Denny’s study, the Boy Scouts have rebranded their flagship program as “Scouts BSA” and started admitting girls to it, which may mean that the gender expectations in their handbooks have changed—yet another reminder that the context in which we conduct our analyses limits their scope.)

Denny was the only coder for her study, so she had to be especially scrupulous about consistently applying the measurement procedure outlined earlier. When two or more coders are involved, we have more checks in place to ensure the quality of the coding process—it is less likely to be the case that one coder’s biases (however unconscious) skew the study’s interpretations of the data. However, when multiple coders are involved, researchers must develop a very detailed coding manual and actively train all the coders to ensure consistency.

Let’s walk through how you should prepare to code content for a study, drawing as needed upon examples from the content analysis we mentioned earlier of James Bond films (Neuendorf et al. 2010), which used eight coders.

  • If a person appears in the film, is that person a character?
  • If the person is a character, what is their gender?
  • If the person is a female character, how much aggression is aimed at her?
  • If the person is a female character, what is the nature of her physical characteristics?
  • Determine the possible attributes of each variable. Remember that attributes are the different categories for a given variable. In a content analysis, these are the options that coders can choose from. Specifically, they will code particular snippets of data as reflecting those attributes. For simple variables, this process of identifying attributes is pretty straightforward. In the Bond study, the “role prominence” variable—which indicated how major or minor a female character was in regards to events of the film overall—just had three attributes: minor, medium, or major. More complex multidimensional concepts, however, may require multiple variables to measure them. For example, the “physical characteristics” variable we just described needed to be measured using eight variables, including “hair color,” “hair length,” “glasses,” “body type,” and “level of attractiveness of physical appearance.”
  • Develop a coding manual providing precise instructions for coding. The coding manual should describe all of the study’s variables and give coders a complete understanding of how to make decisions during the coding process. This step is of paramount importance when multiple coders are used, which was the case in the Neuendorf study. ( Figure 15. 2 shows actual instructions from their full coding manual.) Even with just one coder, however, creating a set of clear instructions will promote consistency throughout the coding process.
  • Test the coding instructions on similar materials to revise your coding manual and train multiple coders. Before you dive into your content analysis of your sources, it’s a good idea to take materials that are somehow comparable to those sources and try out your variable definitions on that content. You can then refine your coding manual based on the outcome of that trial run. Testing the instructions is especially useful when you have multiple coders involved. That testing can serve as part of the training for coders, in addition to whatever informal or formal instruction you wish to give them concerning the coding manual. You can also use the test to check the level of agreement among coders. (In the later section Reliability and Validity in Content Analysis , we’ll discuss how you go about calculating the inter-coder reliability of the coding process for a given study.) For instance, the Neuendorf study was able to evaluate the level of agreement among its eight coders by using a Bond film ( Never Say Never Again , 1983) that was not in the actual study list because it was based on an earlier Bond film ( Thunderball , 1961). This pilot test prompted the researchers to make several changes to the coding manual and have the coders attend additional training sessions. They conducted their final training using the 1967 version of Casino Royale , which featured James Bond as a character but had not been included on the study list because it was a spy parody not produced by the same company that produced the other Bond films. For this study, the researchers were lucky to have two comparable films at their disposal that they could use to test their measurement procedures. If you find it hard to find a similar set of materials for such purposes, however, you can conduct a test on a source you intend to include in your study: do the trial run to refine your coding manual, delete the data, and then code the same film with the revised instructions.

Excerpt from the codebook created by Neuendorf et al. (2010) for their content analysis of James Bond movies.

Coding in Qualitative Content Analysis

Coding in qualitative content analysis is typically an inductive process. That is, researchers do not start out by precisely specifying relevant variables and attributes and creating detailed coding procedures. Instead, they let their ideas regarding conceptualization and operationalization emerge from a careful reading and consideration of the text (see Chapter 4: Research Questions for a fuller discussion of inductive analysis ). In this and other ways, coding in qualitative content analysis follows much the same procedures that are used for other qualitative research methods, such as ethnographic observation and in-depth interviews. That said, keep in mind the earlier distinction we made between quantitative content analysis, which focuses on manifest content, and qualitative content analysis, which focuses on latent content. Rather than just counting more obvious details, qualitative content analysis tends to delve deeply into the data. The researchers immerse themselves in the text through careful reading, trying to get at its underlying meanings.

Sociologist Nikita Carney (2016) analyzed the Twitter debates that occurred after the police shot and killed two unarmed African American men, Michael Brown and Eric Garner, in 2014. In her paper she describes the straightforward qualitative data analysis process she followed:

I decided to use Twitter’s advanced search feature and take screenshots of selected results between December 3 and 7, 2014. The analysis process drew heavily from grounded theory [an inductive approach] in order to identify key themes…. I initially read through approximately 500 tweets from this time period to get a sense for the dialogue on Twitter at this moment in time. Based on this initial read-through, I loosely coded tweets based on whether they used ‘‘#BlackLivesMatter,’’ ‘‘#AllLivesMatter,’’ or both. I selected 100 tweets out of the initial sample of 500, consisting of approximately 30 to 35 tweets from each initial grouping that were representative of the larger sample. I conducted a close textual analysis on these 100 tweets, from which I developed more specific thematic groupings, including ‘‘call to action,’’ ‘‘conflict over signs,’’ and ‘‘shifting signs/discourse.’’ (Carney 2016:188–9)

Just as different people who read the same book will not necessarily have the same interpretation of the text, closely studying latent content is necessarily a subjective process. Qualitative researchers recognize this and often disclose their personal stances toward the research, in a process known as reflexivity (discussed earlier in Chapter 9: Ethnography ). Carney’s paper includes a “personal reflexive statement” to this effect:

I closely followed news surrounding the deaths of Michael Brown and Eric Garner, among other victims of police violence, on mass media and social media. I also took to the streets with other activists and participated in acts of protest at my university on a daily basis. Rather than claiming to produce an ‘‘objective’’ analysis, I use my subjectivity to examine discourse as it unfolded on social media with the goal of better understanding the ways in which youth of color used technology to influence dominant discourse in the nation. (Carney 2016:181)

While content analysis often focuses on either latent or manifest content, the two approaches can be combined in one study. For instance, we previously discussed how Kathleen Denny’s content analysis of scouting handbooks examined manifest content—the descriptions of badge activities that could be categorized as “self-oriented” or “others-oriented”—and calculated which proportion of activities for the Boy Scouts and Girl Scouts fell into each category. But Denny also analyzed latent content in terms of how the handbooks portrayed gender. Based on this analysis, Denny (2011:27) argued that the girls were encouraged to become “up-to-date traditional women,” while boys were urged to adopt “an assertive heteronormative masculinity.” In her paper, Denny described the qualitative and inductive approach she took to arrive at this finding:

Rather than code the texts for the presence or absence of any particular trait, I assessed them holistically, attuned to themes and patterns that emerged having to do with the attitude or approach endorsed by the texts. I performed textual analyses of the girls’ and boys’ handbooks’ official statements as well as a focused comparison of a comparable pair of badges—the girls’ Model Citizen badge and the boys’ Citizen badge. I present findings from the comparison of the citizen badges because the nature of the activities offered in these badges is very similar, bringing gender differences into sharper relief. (Denny 2011:35)

Given the large amount of data that content analysis typically involves, it is often a good idea to conduct the coding using qualitative data analysis (QDA) software, which we described in Chapter 11: Qualitative Data Analysis . For instance, the content analysis mentioned earlier of undergraduate comments on a national survey (Chambers and Chiang 2012) used the NVivo software package to help researchers inductively generate codes and then consolidate, refine, and prioritize certain codes as they reviewed students’ answers to the survey’s open-ended questions.

Reliability and Validity in Content Analysis

As we noted earlier, having multiple coders can help address problems of subjectivity and bias that creep into a content analysis, but it also poses issues with inter-coder reliability (also known as inter-rater reliability , which we discussed previously in Chapter 7: Measuring the Social World ). To ensure the reliability of their measures of different variables, projects that use multiple coders should discuss the degree to which different coders agreed upon how to code each variable in the study. As we will describe, there are statistics that researchers can calculate to convey the level of agreement and disagreement among coders. At the very least, any project involving coding needs to say something about the ways the researchers ensured that their findings were not the result of a very idiosyncratic or unreliable coding process. Even if one person acts as the sole coder, having one other person code a sample of the material will allow for at least some check of inter-coder reliability.

In the study by Kimberly Neuendorf and her collaborators (2010) of women’s portrayals in James Bond films, the researchers calculated two measures commonly used to assess inter-coder reliability: multiple-coder kappa and percentage agreement among coders . The corresponding statistics for each variable in their study are shown in Figure 15. 3 . “Percentage agreement” is the simplest measure of inter-coder reliability: in this study, it is the number of times that the eight coders involved in the study arrived at the same variable attribute for a particular observation (of one female character in one movie, for instance), divided by the total number of observations. The details of calculating kappa are beyond the scope of this textbook, but a kappa value of .40 or higher is generally considered to be acceptable.

Inter-coder reliability calculations presented in Neuendorf et al. (2010).

As shown in the figure, one variable—whether the female character was a “good” or “bad” person—did not meet the .4 threshold for the kappa statistic of reliability. This result isn’t surprising, given that coding the variable required a broad evaluation of whether the female character was “good” (exhibited behaviors that furthered Bond’s goals) or “bad” (exhibited behaviors that were at odds with Bond). In their paper, the researchers said they suspected that ambiguity may have arisen regarding females with minor or medium roles. They concluded that results related to this variable should be interpreted with caution. Note that the percentage agreement among coders was also low for the “good/bad” variable, but not as low as this reliability measure was another variable—whether the character had an accent—which highlights the fact that different inter-coder reliability measures can produce different results, and you may not want to rely on just one. Furthermore, just evaluating inter-coder reliability after the coding is done is not the best strategy; instead, you should test the reliability of your coding procedures before implementation, which will give you the opportunity to improve upon them and possibly keep your reliability measures high across all variables.

As you might remember from Chapter 7: Measuring the Social World , the validity of a measure refers to whether the measurement procedure actually measures the concept it is intended to measure. Several approaches to assessing validity are discussed in that earlier chapter, but we will mention two here that are highly relevant to content analysis. Face validity tells us whether it is plausible that our operational definition of a variable measures the concept it is intended to measure: “on the face of things,” does our particular measure capture the essence of that concept? In The Content Analysis Guidebook, Kimberly Neuendorf (2017:125)—the lead author of the Bond content analysis—points out that assessing a measure’s face validity might seem “deceptively simple,” but it is actually highly useful, requiring that “the researcher take a step back, so to speak, and examine the measures freshly and as objectively as possible.” In fact, Neuendorf advises researchers to go one step further in ensuring face validity: “have others review the measures, with no introduction to the purpose of the study, and have them indicate what they think is being measured, a kind of back translation from operationalization to conceptualization.”

If a measure has content validity , it is said to cover all of the domains or dimensions of a concept that it should cover. This is a critical consideration for any variable included in your content analysis, and achieving content validity may mean breaking up a complex variable into several measures, as we mentioned earlier. Consider the variable in the Bond content analysis that we said failed to meet the kappa threshold for inter-coder reliability: whether the female character was “good or bad.” The coded attributes of this variable were:

  • Starts off as good but turns bad.
  • Starts off as bad but turns good.
  • Is bad throughout the entire film.
  • Is good throughout the entire film.
  • Unable to determine.

We can see why there was ample disagreement in coding this variable, given the complexity of its underlying notion of “good/bad.” For instance, what if a female character vacillated between good and bad during the course of the film? Should that option (or others) be added to cover all the possible dimensions of this underlying concept? These are the sorts of decisions you will need to make when operationalizing key concepts in a content analysis. You need to be able to defend your operational definitions as valid measures, keeping in mind that there is not just one “right” way to set up your variables.

Presenting Content Analysis Results

As we previously alluded to, quantitative content analysis creates numerical datasets that can be analyzed using any of the statistical techniques available to sociologists (see Chapter 14: Quantitative Data Analysis for an in-depth discussion). Though advanced multivariate techniques can certainly be applied, many content analyses published in academic articles just employ univariate and bivariate techniques like the ones we’ll demonstrate now.

Earlier, we discussed a content analysis by Tony Chambers and Ching-hsiao Chiang (2012) of undergraduate comments in response to open-ended survey questions. This analysis was initially qualitative: the researchers coded the undergraduate comments to inductively generate themes. However, the coding scheme they created allowed them to go back and label each of the 843 student comments as exemplifying one or more of the study’s nine overarching code categories. Figure 15. 4 shows how the researchers were able to produce univariate statistics after quantifying aspects of their data. As shown in the frequency table, academic experience represented the largest percentage of total codes.

Frequency table with a listing of nine code categories and their associated counts and percentages.

The frequency distributions calculated through quantitative content analysis can also be summarized in graphical form. Figure 15. 5 presents a chart from a content analysis by Francesca Cancian and Steven Gordon (1988). The researchers examined a sample of marital advice articles from U.S. women’s magazines published between 1900 and 1979. For their analysis, they determined whether each article presented traditional or modern norms about marriage. According to their definitions, a “traditional” perspective held that romance was immature and women should prioritize their household roles; a “modern” perspective promoted passion and impulse and preferred that women take on roles outside the household. The researchers captured the degree of “traditional” versus “modern” content across their sources by using a composite measure that drew upon various codes. The graph depicts changes over time in the percentage of articles advocating modern norms.

Line chart showing the percentage of marital articles from 1900 t0 1979 that advocated for modern norms.

As we discussed in Chapter 14: Quantitative Data Analysis , bivariate analysis allows researchers to examine the relationships between two variables. Usually, one is considered to be the independent variable, and the other is treated as the dependent variable that changes in response. In the analysis of scouting handbooks by Kathleen Denny (2011) that we mentioned earlier, the independent variable was cultural understandings of gender (operationalized through Boy Scouts and Girl Scouts materials) and one of the dependent variables was activity type (coded as either “self-oriented” or “others-oriented”). Denny’s analysis of these two variables is depicted in the graph in Figure 15. 6 .

A three-dimensional bar chart showing the following results: 83 percent of Boy Scouts activities and 70 percent of Girl Scout activities were “self-directed,” and 17 percent of Boy Scout activities and 30 percent of Girl Scouts activities were “others-directed.”

A quantitative content analysis can also produce tables examining bivariate relationships. Figure 15. 7 presents a crosstabulation table created by Carol Auster and Claire Mansbach (2012) for their content analysis of toys marketed on the Disney Store website. Disney had classified 410 of its toys as being for boys and 208 as being for girls, with another 91 toys shown on both lists (which the researchers put in a third category of being for “both boys and girls”). The crosstabulation shows the relationship between the gender category of the toys (the study’s independent variable) and the color palette used for the toys (the dependent variable).

A crosstabulation table showing the following results: for toys marketed as being for girls only, 48.7 percent featured bold colors and 51.3 percent featured pastel colors, a much different distribution than among toys for boys only (91.2 and 8.8 percent, respectively) and among toys for both boys and girls (90.1 and 9.9 percent, respectively).

When presenting the results of qualitative content analysis, sociologists may also break down numerical data in the ways we have just described. However, the focus is usually on interpreting the latent meanings of the text being analyzed. As you learned in Chapter 11: Qualitative Data Analysis , qualitative researchers often use quotes from their interviews to illustrate their participants’ experiences and points of view. In a similar fashion, a qualitative content analysis will include quotes from the original sources to support the researchers’ arguments. Consider the following excerpt from the content analysis of marital advice articles mentioned earlier (Cancian and Gordon 1988:322):

In this century’s early decades, many articles explicitly expected the woman to do the “emotion work” in marriage. A 1932 article in Ladies’ Home Journal , “And So You Are Married,” told readers:

Well, whether your marriage is a success or failure depends upon you, little bride…. For marriage is not only a woman’s vocation, it is her avocation. It is her meal ticket as well as her romance…. So it is not only good ethics but good business for a young wife deliberately to set about keeping her husband in love with her.

Through the use of quotations and paraphrases, qualitative researchers can provide a richer understanding of what the texts they are analyzing are actually communicating—in this example, how marriage advice articles imparted particular beliefs about marriage to their readers.

Strengths and Weaknesses of Content Analysis

Content analysis has a number of advantages over other research methods:

  • The lack of contact with human participants avoids or minimizes some challenges that confront other methods. Content analysis involves analyzing existing texts, rather than collecting data by direct contact with human participants. As a result, methodological issues like nonresponse bias tend to be less of a concern. Ethical scrutiny is less complex than it is for studies more directly involving human participants. That said, you should always remember that the content you are analyzing was created by human beings and is therefore subject to various biases related to reactivity, social desirability, and the many other ways that people’s interests and perceptions shape what they communicate.
  • Many materials are available at low or no cost. For students in particular, content analysis is often the most convenient way of conducting original research, given that a wide variety of content sources are available online for free or via a university library’s physical or digital collections. Refer to our lists of possible materials and example studies at the beginning of this chapter and in Table 15.1 for inspiration about the types of sources you can draw upon.
  • Content analysis can easily apply qualitative and quantitative methods to the study of materials. As we’ve discussed, not only can content analysis examine texts in a quantitative or qualitative fashion, but it can easily incorporate both approaches. The coding can proceed deductively, with the coding categories created in advance, or inductively, with the categories emerging from researchers’ careful classification of the raw data. And it can move between these approaches, as shown in previous examples—for instance, by starting with the inductive generation of codes and then going back to the data to quantify the existence of certain patterns based on those codes.
  • Longitudinal analysis may be possible. Researchers often use content analysis with historical materials, allowing them to provide insights into the evolution of social structures and processes over time. For instance, the content analysis described earlier of marital advice articles in magazines was able to cover a huge span of years—from 1900 to 1979—due to its tight but productive focus, and the result was a sweeping analysis of changes in emotional norms in marriage over decades (Cancian and Gordon 1988). (Note: we discuss historical-comparative research in more depth in the next section.)
  • Content analysis has a high degree of flexibility. Content analysis can be applied to a wide variety of materials and affords researchers substantial flexibility in the research process. Research methods that prove to be inadequate can be discarded and the materials reanalyzed using different techniques. Content analysis also offers flexibility in terms of when the research has to be conducted. Once accessed, the materials are typically available for study at any convenient time.
  • A content analysis tends to be easy to replicate and update. If researchers were systematic in selecting their sample and provided ample documentation regarding how they went about their coding, their content analysis can be easily replicated by others. This is especially true for quantitative content analysis, as it does not involve subjective interpretations of the data.

Photograph of two pages from Anne Lister’s Diary from 1832.

Weaknesses of content analysis research include the following:

  • Desired materials may not be in a format that can be studied. Materials that the researcher would like to subject to content analysis may be hard to decipher. Anne Lister’s extensive diaries provide a good illustration of this problem. Lister was a well-to-do British landowner in the early nineteenth century whom historians consider to be the first “modern lesbian.” She left behind a massive set of diaries containing 5 million-plus words—the longest personal journal known in the English language. However, as shown in Figure 15. 8 , the handwriting in Lister’s diary entries is exceedingly hard to read. About a sixth of the content in her diaries is written in code of her own devising, which she utilized to conceal details about her sexuality and intimate affairs (examples of the code are shown in the lower portions of the excerpted pages). Though the code decryption key has been available to scholars for many years, relatively little of the diary is available in a deciphered and digital form. As a result, a researcher would find this material impossible to use in a content analysis project unless they personally want to invest an enormous amount of time decoding and transcribing an appropriate sample of its pages. [1]
  • Content analysis can be very time-consuming. Even if materials are readily available, the time and effort required to shift through them and identify relevant information may exceed what you as a researcher can devote to a project. Even if a content analysis seems straightforward at the outset, closer inspection may reveal that the text is too complex or the materials too extensive for the study to be feasible. Take the State of the Union addresses we considered earlier as possible content sources. The longest written address is Jimmy Carter’s 1981 address, which is 33,667 words long. The spoken addresses are considerably shorter, but can still be thousands of words, with the longest being Joe Biden’s 2023 address of 9,216. A researcher who wanted to go through every written and spoken State of the Union address would need to read millions of words. As we have seen, researchers often analyze a sample of materials instead of all available materials, but this approach raises the issue of whether the sample is representative of the entire population of materials.
  • Coding may focus on relatively superficial content. Nowadays, the widespread transcription of print texts by Google Books and other services allows researchers to search entire decades’ worth of publications using keywords. Specialized text analysis software can also be used to quickly digitize print materials as needed. However, these technologies may increase the temptation to focus on manifest content—words or phrases that are easy to count across a voluminous body of sources. This approach can make for a relatively superficial analysis, without any attempt to get at the deeper meaning of the texts.
  • Coding instructions can be difficult to develop for complex concepts. Recall the difficulty researchers encountered when they sought to measure whether the female characters featured in James Bond movies were “good or bad” (Neuendorf et al. 2010). This variable did not attain an acceptable level of inter-coder reliability, and part of the reason may have been the difficulty of determining the “goodness” or “badness” of a character when the woman did not play a major role in the film. As this example illustrates, the coding process is not always straightforward, and even a rigorous research project may be unable to produce and train people in coding instructions that avoid such ambiguity and disagreement.
  • Reliability cannot always be assessed with certainty. If we have only one researcher inductively coding the data for a qualitative content analysis, we will always wonder if another researcher—with a different outlook and biases—would have come to the same conclusions. Yet the breadth and complexity of the materials being reviewed may make it impossible to use multiple coders. For example, if we were conducting a qualitative analysis of Anne Lister’s diary, we would need to recruit coders who had enough familiarity with her life and writing style to interpret the otherwise impenetrable text. To its credit, quantitative content analysis can more easily draw upon multiple coders and calculate measures of inter-coder reliability. As we saw earlier, however, different measures can arrive at somewhat different results, and there is no surefire way of knowing whether the coding process was sufficiently reliable.

Key Takeaways

  • Content analysis focuses on the study of recorded communications. The materials that can be analyzed include actual written texts, such as newspapers or journal entries, as well as visual and auditory sources, such as television shows, advertisements, or movies.
  • Content analysis usually focuses on primary sources, though in some instances it may also involve secondary sources.
  • Quantitative content analysis tends to count instances of manifest content, surface-level details about the sources. Qualitative content analysis tends to focus on the underlying meanings of the text, its latent content.
  • Code sheets are used to collect and organize data for content analysis.
  • Quantitative content analyses often present univariate and bivariate statistics in the form of tables and charts. Much like in-depth interviewing, qualitative content analyses tend to use direct or paraphrased quotes in order to describe meanings and explain relationships in the data.
  • Identify a research question you could answer using content analysis. Now state a testable hypothesis having to do with your research question. Identify at least two potential sources of existing data you might analyze to answer your research question and test your hypothesis.
  • Create a code sheet for each of the two potential sources of data that you identified in the preceding exercise.
  • That said, images of all 7,000+ pages of Lister’s diary are available in the West Yorkshire Archive Service’s online catalog , and transcriptions for its earliest entries (1806 to 1814) are now available in PDF form . An extensive decryption and transcribing project is now underway to make the entirety of the content easily accessible. ↵

The Craft of Sociological Research by Victor Tan Chen; Gabriela León-Pérez; Julie Honnold; and Volkan Aytar is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Skip to content

Initiatives and Committees

Plan your research, join a study, curated resources for research design and analysis.

This resource provides curated training content for a non-statistical audience: students, residents, fellows, early-stage investigators, or anyone wanting to learn more about research design and analysis . The current topics provide guidance on the fundamentals of study design and the site will expand to cover a broader range of common statistical topics.

How to N avigate

Each t opic is introduced into an o verview , with curated resources sectioned by modality: v ideos, w ebsites, r eadings, and other relevant c ourse s and s oftware if available. Within each section, resources are l oosely ordered by relevance and utility . Permanent links are provided where possible , however , we also equipped the w ebsite s section with a rchive links for alternative use if users experience inactive sites.  

  • Designing and Refining a Research Question
  • Specific Aims
  • Confirmatory versus Exploratory Research
  • Outcomes and Endpoints

Coming Soon

Future topics include clinical trial and observational designs ; understanding data through visualization and summary statistics ; methods for categorical analysis ; hypothesis testing theory and implementation ; and sample size and power calculations .

This resource is being developed and maintained by members of Biostatistics, Epidemiology, and Research Design (BERD) at the Columbia University Irving Institute for Clinical and Translational Research and is supported by the National Center for Advancing Translational Sciences, National Institutes of Health, through Grant Number UL1TR001873. BERD does not take responsibility for any misuse or misinterpretation of the curated content. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.  

To suggest additional topics or resources, alert us of problems with links, or share suggestions for improvement, email [email protected] .

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • 15 April 2024

Revealed: the ten research papers that policy documents cite most

  • Dalmeet Singh Chawla 0

Dalmeet Singh Chawla is a freelance science journalist based in London.

You can also search for this author in PubMed   Google Scholar

G7 leaders gather for a photo at the Itsukushima Shrine during the G7 Summit in Hiroshima, Japan in 2023

Policymakers often work behind closed doors — but the documents they produce offer clues about the research that influences them. Credit: Stefan Rousseau/Getty

When David Autor co-wrote a paper on how computerization affects job skill demands more than 20 years ago, a journal took 18 months to consider it — only to reject it after review. He went on to submit it to The Quarterly Journal of Economics , which eventually published the work 1 in November 2003.

Autor’s paper is now the third most cited in policy documents worldwide, according to an analysis of data provided exclusively to Nature . It has accumulated around 1,100 citations in policy documents, show figures from the London-based firm Overton (see ‘The most-cited papers in policy’), which maintains a database of more than 12 million policy documents, think-tank papers, white papers and guidelines.

“I thought it was destined to be quite an obscure paper,” recalls Autor, a public-policy scholar and economist at the Massachusetts Institute of Technology in Cambridge. “I’m excited that a lot of people are citing it.”

The most-cited papers in policy

Economics papers dominate the top ten papers that policy documents reference most.

Data from Sage Policy Profiles as of 15 April 2024

The top ten most cited papers in policy documents are dominated by economics research. When economics studies are excluded, a 1997 Nature paper 2 about Earth’s ecosystem services and natural capital is second on the list, with more than 900 policy citations. The paper has also garnered more than 32,000 references from other studies, according to Google Scholar. Other highly cited non-economics studies include works on planetary boundaries, sustainable foods and the future of employment (see ‘Most-cited papers — excluding economics research’).

These lists provide insight into the types of research that politicians pay attention to, but policy citations don’t necessarily imply impact or influence, and Overton’s database has a bias towards documents published in English.

Interdisciplinary impact

Overton usually charges a licence fee to access its citation data. But last year, the firm worked with the London-based publisher Sage to release a free web-based tool that allows any researcher to find out how many times policy documents have cited their papers or mention their names. Overton and Sage said they created the tool, called Sage Policy Profiles, to help researchers to demonstrate the impact or influence their work might be having on policy. This can be useful for researchers during promotion or tenure interviews and in grant applications.

Autor thinks his study stands out because his paper was different from what other economists were writing at the time. It suggested that ‘middle-skill’ work, typically done in offices or factories by people who haven’t attended university, was going to be largely automated, leaving workers with either highly skilled jobs or manual work. “It has stood the test of time,” he says, “and it got people to focus on what I think is the right problem.” That topic is just as relevant today, Autor says, especially with the rise of artificial intelligence.

Most-cited papers — excluding economics research

When economics studies are excluded, the research papers that policy documents most commonly reference cover topics including climate change and nutrition.

Walter Willett, an epidemiologist and food scientist at the Harvard T.H. Chan School of Public Health in Boston, Massachusetts, thinks that interdisciplinary teams are most likely to gain a lot of policy citations. He co-authored a paper on the list of most cited non-economics studies: a 2019 work 3 that was part of a Lancet commission to investigate how to feed the global population a healthy and environmentally sustainable diet by 2050 and has accumulated more than 600 policy citations.

“I think it had an impact because it was clearly a multidisciplinary effort,” says Willett. The work was co-authored by 37 scientists from 17 countries. The team included researchers from disciplines including food science, health metrics, climate change, ecology and evolution and bioethics. “None of us could have done this on our own. It really did require working with people outside our fields.”

Sverker Sörlin, an environmental historian at the KTH Royal Institute of Technology in Stockholm, agrees that papers with a diverse set of authors often attract more policy citations. “It’s the combined effect that is often the key to getting more influence,” he says.

research and analysis of content

Has your research influenced policy? Use this free tool to check

Sörlin co-authored two papers in the list of top ten non-economics papers. One of those is a 2015 Science paper 4 on planetary boundaries — a concept defining the environmental limits in which humanity can develop and thrive — which has attracted more than 750 policy citations. Sörlin thinks one reason it has been popular is that it’s a sequel to a 2009 Nature paper 5 he co-authored on the same topic, which has been cited by policy documents 575 times.

Although policy citations don’t necessarily imply influence, Willett has seen evidence that his paper is prompting changes in policy. He points to Denmark as an example, noting that the nation is reformatting its dietary guidelines in line with the study’s recommendations. “I certainly can’t say that this document is the only thing that’s changing their guidelines,” he says. But “this gave it the support and credibility that allowed them to go forward”.

Broad brush

Peter Gluckman, who was the chief science adviser to the prime minister of New Zealand between 2009 and 2018, is not surprised by the lists. He expects policymakers to refer to broad-brush papers rather than those reporting on incremental advances in a field.

Gluckman, a paediatrician and biomedical scientist at the University of Auckland in New Zealand, notes that it’s important to consider the context in which papers are being cited, because studies reporting controversial findings sometimes attract many citations. He also warns that the list is probably not comprehensive: many policy papers are not easily accessible to tools such as Overton, which uses text mining to compile data, and so will not be included in the database.

research and analysis of content

The top 100 papers

“The thing that worries me most is the age of the papers that are involved,” Gluckman says. “Does that tell us something about just the way the analysis is done or that relatively few papers get heavily used in policymaking?”

Gluckman says it’s strange that some recent work on climate change, food security, social cohesion and similar areas hasn’t made it to the non-economics list. “Maybe it’s just because they’re not being referred to,” he says, or perhaps that work is cited, in turn, in the broad-scope papers that are most heavily referenced in policy documents.

As for Sage Policy Profiles, Gluckman says it’s always useful to get an idea of which studies are attracting attention from policymakers, but he notes that studies often take years to influence policy. “Yet the average academic is trying to make a claim here and now that their current work is having an impact,” he adds. “So there’s a disconnect there.”

Willett thinks policy citations are probably more important than scholarly citations in other papers. “In the end, we don’t want this to just sit on an academic shelf.”

doi: https://doi.org/10.1038/d41586-024-00660-1

Autor, D. H., Levy, F. & Murnane, R. J. Q. J. Econ. 118 , 1279–1333 (2003).

Article   Google Scholar  

Costanza, R. et al. Nature 387 , 253–260 (1997).

Willett, W. et al. Lancet 393 , 447–492 (2019).

Article   PubMed   Google Scholar  

Steffen, W. et al. Science 347 , 1259855 (2015).

Rockström, J. et al. Nature 461 , 472–475 (2009).

Download references

Reprints and permissions

Related Articles

research and analysis of content

We must protect the global plastics treaty from corporate interference

World View 17 APR 24

UN plastics treaty: don’t let lobbyists drown out researchers

UN plastics treaty: don’t let lobbyists drown out researchers

Editorial 17 APR 24

Use game theory for climate models that really help reach net zero goals

Correspondence 16 APR 24

Last-mile delivery increases vaccine uptake in Sierra Leone

Last-mile delivery increases vaccine uptake in Sierra Leone

Article 13 MAR 24

Global supply chains amplify economic costs of future extreme heat risk

Global supply chains amplify economic costs of future extreme heat risk

How science is helping farmers to find a balance between agriculture and solar farms

How science is helping farmers to find a balance between agriculture and solar farms

Spotlight 19 FEB 24

Postdoc in Computational Genomics – Machine Learning for Multi-Omics Profiling of Cancer Evolution

Computational Postdoc - Artificial Intelligence in Oncology and Regulatory Genomics and Cancer Evolution at the DKFZ - limited to 2 years

Heidelberg, Baden-Württemberg (DE)

German Cancer Research Center in the Helmholtz Association (DKFZ)

research and analysis of content

Computational Postdoc

The German Cancer Research Center is the largest biomedical research institution in Germany.

PhD / PostDoc Medical bioinformatics (m/f/d)

The Institute of Medical Bioinformatics and Systems Medicine / University of Freiburg is looking for a PhD/PostDoc Medical bioinformatics (m/w/d)

Freiburg im Breisgau, Baden-Württemberg (DE)

University of Freiburg

research and analysis of content

Postdoctoral Research Associate position at University of Oklahoma Health Sciences Center

Postdoctoral Research Associate position at University of Oklahoma Health Sciences Center   The Kamiya Mehla lab at the newly established Departmen...

Oklahoma City, Oklahoma

University of Oklahoma Health Sciences Center

research and analysis of content

Computational Postdoctoral Fellow with a Strong Background in Bioinformatics

Houston, Texas (US)

The University of Texas MD Anderson Cancer Center

research and analysis of content

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

How to close the Black tech talent gap

While the number and variety of tech jobs have grown steadily over two decades, the technology workforce has not evolved to reflect the makeup of the American workforce. Organizations have worked to improve representation among Black employees and executives in technology-related jobs across industries, but there is more work to be done.

The Black technology workforce

Black people make up 12 percent of the US workforce but only 8 percent of employees in tech jobs. 1 State of the tech workforce , CompTIA, March 2022. That percentage is even smaller further up the corporate ladder; just 3 percent of technology executives in the C-suite are Black, according to a McKinsey analysis of Fortune 500 executives. 2 Based on an analysis of Fortune 500 tech executives in chief information officer or chief technology officer roles who identify as Black. That gap is likely to widen over the next decade. Across all industries, technology jobs—those in data science, engineering, cybersecurity, and software development—are expected to grow 14 percent by 2032. Black tech talent in those roles is expected to grow only 8 percent over the same period (Exhibit 1).

Developing inclusive technologies and bridging a gap worth billions

Black households stand to lose out on more than a cumulative $350 billion in tech job wages by 2030, an amount equal to one-tenth the total wealth held by those households, according to a McKinsey Institute for Black Economic Mobility analysis.

The wage gap in tech roles is expected to grow nearly 37 percent, from $37.5 billion in 2023 to $51.3 billion in annual lost wages by 2030, according to our analysis (Exhibit 2).

Increasing Black representation in technology jobs isn’t just about bridging wage gaps. It means improving the lives of those who are regularly othered, diminished, and discounted in workplaces where they may be the only Black person. It’s also about developing inclusive technologies  that have transformative potential for Black communities. For example, digital banking platforms designed to be inclusive of Black consumers provide financial services that can improve the living standards in communities underserved by traditional banks.

Businesses, nonprofit organizations, and public-sector agencies must take coordinated action to increase Black representation in tech jobs. Specifically, they should reexamine their approach at five critical junctures throughout the career journey for Black tech talent, by improving STEM education at the K–12 level, strengthening HBCU partnerships, expanding opportunities for alternatively skilled talent, replacing mentorship with sponsorship, and empowering Black leaders to thrive. Doing so will support the Black technology workforce for generations to come.

Meet STEM students where they are

Education programs focused on science, technology, engineering, and math (STEM) fields in K–12 schools have long been seen as potential feeders into the technology workforce. Programs focused on helping subsets of students began to proliferate from both the public sector and nonprofits in the 2010s; Girls Who Code and NASA’s Next Gen STEM are just two examples.

Such programs are a promising start, but there’s a lot of opportunity to do more. According to the Pew Research Center, Black students earned only 7 percent of STEM bachelor’s degrees in 2018, compared with 10 percent of all bachelor’s degrees. 3 Rick Fry, Cary Funk, and Brian Kennedy, “STEM jobs see uneven progress in increasing gender, racial and ethnic diversity,” Pew Research Center, April 1, 2021. The COVID-19 pandemic may have further shrunk the pipeline: Black and Hispanic students experienced sharper declines in fourth-grade math test scores during the pandemic compared with their White and Asian peers, wiping out decades of progress. 4 Sarah Mervosh and Ashley Wu, “Math scores fell in nearly every state, and reading dipped on national exam,” New York Times , October 24, 2022. Without intervention, it’s possible the lagging test scores will lead to a decrease in the number of Black students who eventually pursue STEM careers.

While much of the nonprofit sector’s work has increased diversity in STEM, there could be more targeted efforts from businesses specifically designed to encourage Black student participation. Only 20 percent of Fortune 100 companies have a K–12 STEM partnership focused on students in underserved communities, according to a McKinsey analysis.

Businesses can meet students where they are by underwriting technology courses or offering information sessions in predominantly Black communities. Numerous studies have documented the positive effect that a sense of belonging in education has on academic retention: K–12 students and first-year college students who feel a sense of belonging among their peers are likelier to participate in classroom discussions, believe they will succeed in a subject area, and are more motivated. 5 Lynley H. Anderman, Tierra M. Freeman, and Jane M. Jensen, “Sense of belonging in college freshmen at the classroom and campus levels,” Journal of Experimental Education , 2010, Volume 75, Number 3. STEM programs that target schools with a high population of Black students are likely to help plug future talent gaps in tech.

A Pew Research survey published in April 2022 found that the percentage of Black adults who say “Black people have reached the highest levels of success” in a range of careers was highest for professional athletes and musicians, at more than double the rate of engineers and scientists, indicating that survey respondents don’t perceive STEM fields to be welcoming to Black talent (Exhibit 3). For students who may not have a role model in tech, community-focused approaches help increase exposure to both companies and role models.

Nonprofits have often led the charge in bringing greater STEM awareness to Black communities. One example is MITRE, an organization that provides tech expertise to the US government. MITRE gives its employees 40 paid hours of “civic duty” to participate in in-classroom and after-school programs at K–12 schools in Black and Hispanic communities; it also reimburses employees for expenses (like travel and parking) related to their participation in these programs. MITRE’s initiatives have exposed thousands of students and their parents to opportunities in STEM.

Even as companies encourage employees to participate in volunteer programs, they should be mindful to not add to Black employees’ workload or to make participation a requirement for promotion. They should encourage employees of all races—not just Black employees—to engage in racial-equity efforts.

Create stronger corporate HBCU partnerships

Historically Black colleges and universities (HBCUs) are a significant driver of economic mobility for Black people and produce many of the country’s Black technologists. Companies have been working with HBCUs to provide resources and create a talent pipeline  for STEM students for more than two decades. Boeing, IBM, and Netflix are just three of the many companies that have partnered with HBCUs.

Still, there’s room to improve the effectiveness of these partnerships.

The experience of one technology company might provide useful lessons. The company launched a lauded program that relied on volunteer employees to mentor HBCU students and teach courses but did not provide employees with incentives to participate. The program created internships for HBCU students, but there was no follow-through when the internships ended (and many of the HBCU interns did not go on to work at the company upon graduation). Also, the company partnered with only a small fraction of HBCUs across the country. Finally, while the company helped develop technology courses for HBCUs, it did not underwrite the costs of those programs or offer scholarships to students, some of whom took out additional student loans to participate in the program.

Organizations with money to invest in their future workforce can direct funds toward HBCU curriculum development, career offices, and faculty training. For instance, Harvard University runs a free data science pedagogy workshop for educators at HBCUs and other minority-serving institutions, to broaden the pipeline of future graduate students in the field. IBM is partnering with 13 HBCUs to build a new Quantum Center that gives students access to IBM quantum computers , as well as educational support and research opportunities. Ideally, businesses would be able to underwrite the cost of internships or related programs so that they are free or affordable for Black students.

Not all businesses will be able to afford national HBCU outreach or cost-subsidized internship programs, however. But even those with less cash on hand can better work with HBCUs and their students: those with internship programs can offer more professional development during internships to increase the chances a student is hired after graduation and expand partnerships beyond the universe of well-known HBCUs. They should also increase partnerships with non-HBCUs that have high Black and Hispanic student populations.

Expand opportunities for alternatively skilled talent

People without college degrees are likely to be overlooked by employers that still hire according to traditional standards. Of the 17 million Black workers in the United States, 65 percent developed their skills through alternative routes —meaning they have a high school diploma and may have military or workforce experience but do not have a bachelor’s degree. 6 “Spotlight on Black STARs: Insights for employers to access the skilled and diverse talent they’ve been missing,” Opportunity at Work, November 2, 2022. By this measure, jobs that require a bachelor’s degree are out of reach for most Black workers.

By removing the requirement for a bachelor’s degree, businesses immediately expand the applicant pool. Additionally, they can partner with platforms that help train “ready to learn” talent—people who have experience in other fields with transferable skills  but may require additional development—to find qualified candidates with nontraditional backgrounds.

Some businesses are already investing in such programs. Nasdaq and Oracle partner with Kura Labs, an online academy that offers free training and job placement for engineers in underserved communities. The organization says its efforts have resulted in $12 million in new wages in less than 18 months. Meanwhile, other companies including Pandora and Twitch have partnered with the platform OnRamp Technology, which works with more than 100 boot camps, online communities, and education and training providers. Three out of four people hired through OnRamp are people of color.

About the research

The results of a new McKinsey Black Tech Talent Survey help illustrate where problems persist. In July 2022, McKinsey surveyed 82 Black professionals in the United States across entry-level, mid-level, and C-suite technology roles, both within and outside technology companies. The survey aimed to understand the impact of increasing Black representation in tech roles across industries and opportunities to elevate Black tech talent into executive roles. While the findings may not be definitive, they are directionally representative. This research builds upon previous “Race in the workplace” studies  as well as existing work from the McKinsey Institute for Black Economic Mobility , which seeks to provide independent research to offer guidance on how to improve racial inequities around the world.

But recruiting ready-to-learn talent helps improve representation only if a company also reexamines its interview processes. Résumés that indicate a candidate is Black—either because of the candidate’s name, school, or work history, for example—have been found to generate fewer interview requests than résumés reflecting characteristics of White candidates. 7 Marianne Bertrand and Sendhil Mullainathan, Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination , National Bureau of Economic Research, working paper, July 2003. In our survey of Black tech talent, respondents say their companies “do not do enough outreach” and “have not yet incorporated procedures like blind résumés” (stripping a résumé of any indicators of gender identity or race) to broaden talent pools (see sidebar, “About the research”).

Replace mentorship with sponsorship

Black tech professionals change companies every three and a half years on average, compared with every five or more years for their non-Black counterparts. This pattern continues over the course of a career: Black professionals with 21 years or more of tech experience have changed companies more than seven times on average, compared with six times for their non-Black peers. 8 Cecyl Hobbs, “Shaping the future of leadership for Black tech talent,” Russell Reynolds Associates, January 27, 2022. The higher attrition rate means Black talent is less likely to stay at a company long enough to be promoted.

In efforts to retain Black employees, some companies have created mentorship programs—but the programs aren’t always effective: across industries, only 13 percent of Black management-level employees and only 20 percent of Black entry-level employees strongly agree that their sponsors are effective at creating opportunities for them (Exhibit 4).

Mentorship programs may fail for a variety of reasons. A business may mandate mentor pairing for new hires, but often these relationships are transactional and lack the kind of connection that allows the relationship to last. (Employees who choose their mentees may do so according to familiar networks, like a shared school, or other factors that exclude Black employees.) Mentorship programs may also lack processes that guide mentors and mentees through the relationship and may only measure intangible or difficult-to-quantify metrics, like satisfaction in your mentor.

Ultimately, mentorship is not enough to keep Black tech employees from leaving companies. Sponsorship —the idea that senior leaders are tasked with creating apprenticeship and networking opportunities, as well as helping talent navigate transitions at work like a promotion—is more impactful. These relationships require both parties to create a development strategy with specific goals that are measurable.

Enabling Black leaders to thrive

When asked what they believe are the top three most important initiatives for advancing Black talent in tech, 83 percent of Black tech employees we surveyed said advancement opportunities were among the top three most important components of growth for Black tech talent, more than inclusion seminars or external advocacy and investment. More than a third said advancement opportunities were the most important factor. There are additional ways companies can support Black tech talent beyond advancement opportunities, particularly when it comes to fostering an inclusive workplace (Exhibit 5).

Even when Black employees in tech successfully complete corporate leadership and executive training programs, a promotion may remain elusive. This may happen for two reasons: an existing Black tech leader might be skilled in one area (for example, IT project management) but lack the skills required in another (for example, data science) to grow into a C-suite-level executive role. Upskilling these employees in tech’s fastest-growing areas is one way they can be supported.

Additionally, businesses that are too focused on training Black tech talent without adopting organizational change are setting those employees up for failure. Partnering with organizations that create leadership training programs for aspiring leaders as well as existing leaders creates two streams of parallel growth at a company. It’s also important that these organizations are specifically focused on elevating Black tech talent, as general executive leadership programs may overlook some of the nuances of the Black experience in technology that shape someone’s career journey.

The Information Technology Senior Management Forum (ITSMF), a charitable organization that counts Amazon Web Services and PepsiCo among its partners, serves as an example of how to do this successfully. ITSMF offers a leadership academy for future Black tech talent, in addition to a management academy tailored for existing executives. Businesses that partner with ITSMF also engage in unconscious bias or cultural intelligence workshops and cohost networking events for prospective executive talent. Up to 80 percent of ITSMF leadership academy graduates received promotions within 18 months of completing the program, according to the group.

Seizing these five opportunities—at the K–12 level, in higher education, with alternatively skilled talent, in sponsorship, and in leadership training—will help to close the Black tech talent gap. Many businesses today are undertaking resiliency measures to prepare for tough times ahead and help curb losses. It is during such times of economic uncertainty when it’s both easiest for businesses to cut critical investments in Black tech talent, and when it’s most important not to.

Jan Shelly Brown is a partner in McKinsey’s New Jersey office, where Chris Perkins is an associate partner; Matthew Finney is a consultant in the Bay Area office; and Mark McMillan is a senior partner in the Washington, DC, office.

The authors wish to thank Tanguy Catlin, Tiffany Chen, Rob Levin, Roger Roberts, and Sonia Shah for their contributions to this article.

This article was edited by Alexandra Mondalek, an editor in the New York office.

Explore a career with us

Related articles.

research and analysis of content

Closing the digital divide in Black America

Annie Jean-Baptiste photo

Making product inclusion and equity a core part of tech

Shot of a young woman using a laptop in a server room - stock photo

Mining for tech-talent gold: Seven ways to find and keep diverse talent

Read our research on: Gun Policy | International Conflict | Election 2024

Regions & Countries

Changing partisan coalitions in a politically divided nation, party identification among registered voters, 1994-2023.

Pew Research Center conducted this analysis to explore partisan identification among U.S. registered voters across major demographic groups and how voters’ partisan affiliation has shifted over time. It also explores the changing composition of voters overall and the partisan coalitions.

For this analysis, we used annual totals of data from Pew Research Center telephone surveys (1994-2018) and online surveys (2019-2023) among registered voters. All telephone survey data was adjusted to account for differences in how people respond to surveys on the telephone compared with online surveys (refer to Appendix A for details).

All online survey data is from the Center’s nationally representative American Trends Panel . The surveys were conducted in both English and Spanish. Each survey is weighted to be representative of the U.S. adult population by gender, age, education, race and ethnicity and other categories. Read more about the ATP’s methodology , as well as how Pew Research Center measures many of the demographic categories used in this report .

The contours of the 2024 political landscape are the result of long-standing patterns of partisanship, combined with the profound demographic changes that have reshaped the United States over the past three decades.

Many of the factors long associated with voters’ partisanship remain firmly in place. For decades, gender, race and ethnicity, and religious affiliation have been important dividing lines in politics. This continues to be the case today.

Pie chart showing that in 2023, 49% of registered voters identify as Democrats or lean toward the Democratic Party, while 48% identify as Republicans or lean Republican.

Yet there also have been profound changes – in some cases as a result of demographic change, in others because of dramatic shifts in the partisan allegiances of key groups.

The combined effects of change and continuity have left the country’s two major parties at virtual parity: About half of registered voters (49%) identify as Democrats or lean toward the Democratic Party, while 48% identify as Republicans or lean Republican.

In recent decades, neither party has had a sizable advantage, but the Democratic Party has lost the edge it maintained from 2017 to 2021. (Explore this further in Chapter 1 . )

Pew Research Center’s comprehensive analysis of party identification among registered voters – based on hundreds of thousands of interviews conducted over the past three decades – tracks the changes in the country and the parties since 1994. Among the major findings:

Bar chart showing that growing racial and ethnic diversity among voters has had a far greater impact on the composition of the Democratic Party than the Republican Party.

The partisan coalitions are increasingly different. Both parties are more racially and ethnically diverse than in the past. However, this has had a far greater impact on the composition of the Democratic Party than the Republican Party.

The share of voters who are Hispanic has roughly tripled since the mid-1990s; the share who are Asian has increased sixfold over the same period. Today, 44% of Democratic and Democratic-leaning voters are Hispanic, Black, Asian, another race or multiracial, compared with 20% of Republicans and Republican leaners. However, the Democratic Party’s advantages among Black and Hispanic voters, in particular, have narrowed somewhat in recent years. (Explore this further in Chapter 8 .)

Trend chart comparing voters in 1996 and 2023, showing that since 1996, voters without a college degree have declined as a share of all voters, and they have shifted toward the Republican Party. It’s the opposite for college graduate voters.

Education and partisanship: The share of voters with a four-year bachelor’s degree keeps increasing, reaching 40% in 2023. And the gap in partisanship between voters with and without a college degree continues to grow, especially among White voters. More than six-in-ten White voters who do not have a four-year degree (63%) associate with the Republican Party, which is up substantially over the past 15 years. White college graduates are closely divided; this was not the case in the 1990s and early 2000s, when they mostly aligned with the GOP. (Explore this further in Chapter 2 .)

Beyond the gender gap: By a modest margin, women voters continue to align with the Democratic Party (by 51% to 44%), while nearly the reverse is true among men (52% align with the Republican Party, 46% with the Democratic Party). The gender gap is about as wide among married men and women. The gap is wider among men and women who have never married; while both groups are majority Democratic, 37% of never-married men identify as Republicans or lean toward the GOP, compared with 24% of never-married women. (Explore this further in Chapter 3 .)

A divide between old and young: Today, each younger age cohort is somewhat more Democratic-oriented than the one before it. The youngest voters (those ages 18 to 24) align with the Democrats by nearly two-to-one (66% to 34% Republican or lean GOP); majorities of older voters (those in their mid-60s and older) identify as Republicans or lean Republican. While there have been wide age divides in American politics over the last two decades, this wasn’t always the case; in the 1990s there were only very modest age differences in partisanship. (Explore this further in Chapter 4 .)

Dot plot chart by income tier showing that registered voters without a college degree differ substantially by income in their party affiliation. Non-college voters with middle, upper-middle and upper family incomes tend to align with the GOP. A majority with lower and lower-middle incomes identify as Democrats or lean Democratic.

Education and family income: Voters without a college degree differ substantially by income in their party affiliation. Those with middle, upper-middle and upper family incomes tend to align with the GOP. A majority with lower and lower-middle incomes identify as Democrats or lean Democratic. There are no meaningful differences in partisanship among voters with at least a four-year bachelor’s degree; across income categories, majorities of college graduate voters align with the Democratic Party. (Explore this further in Chapter 6 .)

Rural voters move toward the GOP, while the suburbs remain divided: In 2008, when Barack Obama sought his first term as president, voters in rural counties were evenly split in their partisan loyalties. Today, Republicans hold a 25 percentage point advantage among rural residents (60% to 35%). There has been less change among voters in urban counties, who are mostly Democratic by a nearly identical margin (60% to 37%). The suburbs – perennially a political battleground – remain about evenly divided. (Explore this further in Chapter 7 . )

Growing differences among religious groups: Mirroring movement in the population overall, the share of voters who are religiously unaffiliated has grown dramatically over the past 15 years. These voters, who have long aligned with the Democratic Party, have become even more Democratic over time: Today 70% identify as Democrats or lean Democratic. In contrast, Republicans have made gains among several groups of religiously affiliated voters, particularly White Catholics and White evangelical Protestants. White evangelical Protestants now align with the Republican Party by about a 70-point margin (85% to 14%). (Explore this further in Chapter 5 .)

What this report tells us – and what it doesn’t

In most cases, the partisan allegiances of voters do not change a great deal from year to year. Yet as this study shows, the long-term shifts in party identification are substantial and say a great deal about how the country – and its political parties – have changed since the 1990s.

Bar chart showing that certain demographic groups are strengths and weaknesses for the Republican and Democratic coalitions of registered voters. For example, White evangelical Protestands, White non-college voters and veterans tend to associate with the GOP, while Black voters and religiously unaffiliated voters favor the Democrats

The steadily growing alignment between demographics and partisanship reveals an important aspect of steadily growing partisan polarization. Republicans and Democrats do not just hold different beliefs and opinions about major issues , they are much more different racially, ethnically, geographically and in educational attainment than they used to be.

Yet over this period, there have been only modest shifts in overall partisan identification. Voters remain evenly divided, even as the two parties have grown further apart. The continuing close division in partisan identification among voters is consistent with the relatively narrow margins in the popular votes in most national elections over the past three decades.

Partisan identification provides a broad portrait of voters’ affinities and loyalties. But while it is indicative of voters’ preferences, it does not perfectly predict how people intend to vote in elections, or whether they will vote. In the coming months, Pew Research Center will release reports analyzing voters’ preferences in the presidential election, their engagement with the election and the factors behind candidate support.

Next year, we will release a detailed study of the 2024 election, based on validated voters from the Center’s American Trends Panel. It will examine the demographic composition and vote choices of the 2024 electorate and will provide comparisons to the 2020 and 2016 validated voter studies.

The partisan identification study is based on annual totals from surveys conducted on the Center’s American Trends Panel from 2019 to 2023 and telephone surveys conducted from 1994 to 2018. The survey data was adjusted to account for differences in how the surveys were conducted. For more information, refer to Appendix A .

Previous Pew Research Center analyses of voters’ party identification relied on telephone survey data. This report, for the first time, combines data collected in telephone surveys with data from online surveys conducted on the Center’s nationally representative American Trends Panel.

Directly comparing answers from online and telephone surveys is complex because there are differences in how questions are asked of respondents and in how respondents answer those questions. Together these differences are known as “mode effects.”

As a result of mode effects, it was necessary to adjust telephone trends for leaned party identification in order to allow for direct comparisons over time.

In this report, telephone survey data from 1994 to 2018 is adjusted to align it with online survey responses. In 2014, Pew Research Center randomly assigned respondents to answer a survey by telephone or online. The party identification data from this survey was used to calculate an adjustment for differences between survey mode, which is applied to all telephone survey data in this report.

Please refer to Appendix A for more details.

Add Pew Research Center to your Alexa

Say “Alexa, enable the Pew Research Center flash briefing”

Report Materials

Table of contents, behind biden’s 2020 victory, a voter data resource: detailed demographic tables about verified voters in 2016, 2018, what the 2020 electorate looks like by party, race and ethnicity, age, education and religion, interactive map: the changing racial and ethnic makeup of the u.s. electorate, in changing u.s. electorate, race and education remain stark dividing lines, most popular.

About Pew Research Center Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of The Pew Charitable Trusts .

National Center for Science and Engineering Statistics

  • Report PDF (588 KB)
  • Report - All Formats .ZIP (2.2 MB)
  • Share on X/Twitter
  • Share on Facebook
  • Share on LinkedIn
  • Send as Email

International Collaboration in Selected Critical and Emerging Fields: COVID-19 and Artificial Intelligence

April 11, 2024

Research collaboration is a critical strategy for pooling resources, sharing expertise, and accelerating innovation, and institutions may use collaboration to synthesize novel ideas and bridge knowledge or material gaps (Katz and Hicks 1997; Lee, Walsh, and Wang 2015; Wagner et al. 2001). Ongoing research on the transformative potential of artificial intelligence (AI) and the mitigation and treatment of COVID-19 in 2020 are two cases in which scientific progress has been important. Both fields have been recognized as national priorities ( https://www.whitehouse.gov/priorities/ ) and have complex challenges that both domestic and international institutions are motivated to overcome.

A country’s collaboration patterns, both domestic and international, can indicate the presence of expertise or the necessity of knowledge and resource sharing, as countries tend to collaborate internationally less in fields when they have sufficient resources within their own borders (Chinchilla-Rodríguez, Sugimoto, and Larivière 2019). International research collaboration can provide a rapid response to societal challenges, including public health crises (Carvalho et al. 2023) or technological paradigm shifts, and strong international collaborators play a large role in shaping the direction and priorities of research fields worldwide (Leydesdorff and Wagner 2009). A concentration on domestic research can indicate the presence of sufficient domestic knowledge and resources or an interest in preserving in-house expertise. This InfoBrief examines the extent to which top producers of science and engineering (S&E) articles engaged in domestic and international collaborations in AI and COVID-19 research.

Growth in Artificial Intelligence Articles

Between 2003 and 2022, the number of published articles in AI grew faster relative to the number of articles in computer science, table SPBS-22 in National Science Board, National Science Foundation. 2023. Publications Output: U.S. Trends and International Comparisons. Science and Engineering Indicators 2024. NSB-2023-33. Available at https://ncses.nsf.gov/pubs/nsb202333 ." data-bs-content="See table SPBS-22 in National Science Board, National Science Foundation. 2023. Publications Output: U.S. Trends and International Comparisons. Science and Engineering Indicators 2024. NSB-2023-33. Available at https://ncses.nsf.gov/pubs/nsb202333 ." data-endnote-uuid="5569bd18-3709-4dce-830a-89d0460f257a">​ See table SPBS-22 in National Science Board, National Science Foundation. 2023. Publications Output: U.S. Trends and International Comparisons. Science and Engineering Indicators 2024. NSB-2023-33. Available at https://ncses.nsf.gov/pubs/nsb202333 . due in part to the newness of the AI field compared with the more established field of computer science. AI articles worldwide grew by 1,100% during this period, reaching 123,402 articles in 2022, table SPBS-99 ." data-bs-content="See NSB-2023-33, table SPBS-99 ." data-endnote-uuid="5d5b3221-7e43-4793-83a8-dfe2d9c83207">​ See NSB-2023-33, table SPBS-99 . or 4% of all S&E publications globally, figure PBS-3 ." data-bs-content="See NSB-2023-33, figure PBS-3 ." data-endnote-uuid="dca1499d-57bb-40e0-8612-5666dde1c402">​ See NSB-2023-33, figure PBS-3 . compared with 290% growth in computer science articles. table SPBS-22 ." data-bs-content="See NSB-2023-33, table SPBS-22 ." data-endnote-uuid="7542a24d-409a-4612-abb1-2b561d8afe2b">​ See NSB-2023-33, table SPBS-22 . From 2017 to 2022, the six countries with the highest overall publication outputs figure PBS-3 ." data-bs-content="See NSB-2023-33, figure PBS-3 ." data-endnote-uuid="77e4b989-e695-4853-90bc-fe667d3a3ea4">​ See NSB-2023-33, figure PBS-3 . were also the countries with the highest AI research output (China, India, the United States, Japan, the United Kingdom, and Germany) ( figure 1 ). In 2022, the top two producers of AI research articles were China (42,524 articles, or 35% of total AI publication output) and India (22,557, or 18%), followed by the United States (12,642, or 10%). Germany, Japan, and the United Kingdom published similar numbers of publications, ranging between 3,700 and 4,700 articles (3% – 4%).

  • For grouped bar charts, Tab to the first data element (bar/line data point) which will bring up a pop-up with the data details
  • To read the data in all groups Arrow-Down will go back and forth
  • For bar/line chart data points are linear and not grouped, Arrow-Down will read each bar/line data points in order
  • For line charts, Arrow-Left and Arrow-Right will move to the next set of data points after Tabbing to the first data point
  • For stacked bars use the Arrow-Down key again after Tabbing to the first data bar
  • Then use Arrow-Right and Arrow-Left to navigate the stacked bars within that stack
  • Arrow-Down to advance to the next stack. Arrow-Up reverses

AI articles, by selected country: 2003–22

AI = artificial intelligence.

AI article counts refer to publications from a selection of conference proceedings and peer-reviewed journals in science and engineering fields from Scopus. The subset of AI articles was determined by All Science Journal Classification subject matter classification, supplemented by an algorithm that used a series of article characteristics to determine the field of papers published in multidisciplinary journals. Articles are classified by their year of publication and are assigned to a region, country, or economy on the basis of the institutional addresses of the authors listed in the article. Articles are credited on a whole count basis (i.e., for articles produced by authors from different countries, each country is credited for one article). Data for all regions, countries, and economies are available in supplemental table SPBS-99 in Publications Output: U.S. Trends and International Comparisons ( https://ncses.nsf.gov/pubs/nsb202333/table/SPBS-99 ).

National Center for Science and Engineering Statistics; Science-Metrix; Elsevier, Scopus abstract and citation database, accessed April 2023.

Collaboration Trends in Artificial Intelligence Articles

Coauthorship trends on S&E articles shed light on overall collaboration practices. The affiliations of authors to their home institutions and countries are used to infer whether collaboration has occurred across institutions, both domestically and internationally. Three types of collaboration are detailed in this InfoBrief, and an article is the unit of analysis. An article with at least one author from an institution of a given country is classified as one of three categories: an international collaboration , if an author from any other country is present; a domestic collaboration , if all authors are from the same country, but are affiliated with more than one institution; or a single institution article if all authors share the same institutional affiliation or the article is solo authored.

Collaboration Trends

From 2017 to 2022, 37% of U.S. research papers on AI were the result of international collaboration, placing the United States between the five other top producers of AI research papers, with the United Kingdom (61%) and Germany (40%) producing a higher rate of internationally collaborative research and with Japan (25%), China (17%), and India (10%) producing a lower rate ( figure 2 ). Rates of international collaboration for the United States were slightly lower for AI research papers than for all S&E research papers (37% versus 39%). Likewise, across the other five top producers of AI research papers, rates of international collaboration were lower for AI research papers than for all S&E research papers. Compared with other countries, China had the greatest proportion of AI papers that were domestic collaborations (41%). Across the six top-producing countries, the rate of articles produced by a single institution were more common in AI research than in all S&E research (42% versus 26%).

International collaboration, domestic collaboration, and single institution publications on AI research and overall international collaboration on all S&E research, by selected country: 2017–22

AI = artificial intelligence; S&E = science and engineering.

AI articles are assigned to a country, or economy on the basis of the institutional addresses of the authors listed in the article. The subset of AI articles was determined by All Science Journal Classification subject matter classification, supplemented by an algorithm that used a series of article characteristics to determine the field of papers published in multidisciplinary journals. Articles are credited on a whole count basis (i.e., for articles produced by authors from different countries, each country is credited for one article). The percentages refer to the proportion of AI articles to feature collaboration or to the proportion of general articles across all fields to feature collaboration. Articles were excluded when one or more coauthored publications had incomplete address information in the Scopus database; therefore, they cannot be reliably identified as international or domestic collaborations. Data for all regions, countries, and economies are available in supplemental table SPBS-99 and supplemental table SPBS-33 in Publications Output: U.S. Trends and International Comparisons ( https://ncses.nsf.gov/pubs/nsb202333/table/SPBS-99 and https://ncses.nsf.gov/pubs/nsb202333/table/SPBS-33 ).

International Collaboration

Overall, scientific research has become increasingly collaborative over time (Gazni, Sugimoto, and Didegah 2012; Wuchty, Jones, and Uzzi 2007). Although the rate of international collaboration in AI publications has been smaller than the rate of international collaboration across all S&E fields over the past 5 years, international collaboration in AI articles has gradually increased overall between 2003 and 2022. By country, international collaborations in AI increased in Japan (from 15% to 28%), the United States (from 24% to 39%), Germany (from 37% to 42%), and the United Kingdom (from 36% to 66%) ( figure 3 ). Over this same time period, India and China did not show an increasing trend, despite some fluctuation. For example, after China exhibited a period of increased international collaboration in AI research, from 7% in 2009 to 23% in 2015, the rate has since decreased to 16% in 2022.

International collaboration on AI articles, by selected country: 2003–22

AI article counts refer to publications from a selection of conference proceedings and peer-reviewed journals in science and engineering fields from Scopus. The subset of AI articles was determined by All Science Journal Classification subject matter classification, supplemented by an algorithm that used a series of article characteristics to determine the field of papers published in multidisciplinary journals. Articles are assigned to a country, or economy on the basis of the institutional addresses of the authors listed in the article. Articles are credited on a whole count basis (i.e., for articles produced by authors from different countries, each country is credited for one article). The percentages refer to the proportion of AI articles to feature collaboration. Data for all regions, countries, and economies are available in supplemental table SPBS-99 in Publications Output: U.S. Trends and International Comparisons ( https://ncses.nsf.gov/pubs/nsb202333/table/SPBS-99 ).

Domestic Collaborations and Single Institution Publications

The proportion of single institution publications in AI decreased over time in the United States, from 48% in 2003 to 31% in 2022 ( figure 4 ). Despite this decrease, the proportion of U.S. single institution publications remained higher in AI research than in all S&E research, which decreased from 36% to 20% over the same time period. Over time, the rate of domestic collaboration in AI between U.S. institutions remained relatively stable from 2003 to 2022, ranging between 25% and 30%. In China, the proportion of single institution publications in AI decreased from 59% to 38% between 2003 and 2022, albeit with more fluctuation. China’s proportion of single institution publications both in AI papers and among all S&E fields were similar until 2007, after which the proportion of single institution papers in AI research became higher, while the overall proportion of single institution papers in all S&E research continued to decrease.

Collaborative and single institution articles on AI and single institution articles on all S&E research in the United States and China: 2003–22

Article counts refer to publications from a selection of conference proceedings and peer-reviewed journals in S&E fields from Scopus. The subset of AI articles was determined by All Science Journal Classification subject matter classification, supplemented by an algorithm that used a series of article characteristics to determine the field of papers published in multidisciplinary journals. Articles are assigned to a country, or economy on the basis of the institutional addresses of the authors listed in the article. Articles are credited on a whole count basis (i.e., for articles produced by authors from different countries, each country is credited for one article). The percentages refer to the proportion of AI articles to feature collaboration or to the proportion of general articles across all fields to feature collaboration. Articles were excluded when one or more coauthored publications had incomplete address information in the Scopus database; therefore, they cannot be reliably identified as international or domestic collaborations. Data for all regions, countries, and economies are available in supplemental table SPBS-99 and supplemental table SPBS-33 in Publications Output: U.S. Trends and International Comparisons ( https://ncses.nsf.gov/pubs/nsb202333/table/SPBS-99 and https://ncses.nsf.gov/pubs/nsb202333/table/SPBS-33 ).

COVID-19 Research Collaboration

In 2020, COVID-19 was identified as a national priority ( https://www.whitehouse.gov/priorities/ ), and this shifting priority in research may have impacted collaboration patterns for this research area in 2020. In the same year, 35% of the United States’ published research on COVID-19 involved international collaborations, which was lower than the rates in the United Kingdom (55%), Germany (52%), and Japan (45%) but was higher than the rates in China (27%) and India (28%) ( figure 5 ). The overall rates of international collaboration in the United Kingdom and Germany were higher for all S&E research than for COVID-19 research (65% and 55%, respectively).

International collaboration, domestic collaboration, and single institution publications on COVID-19 research and overall international collaboration on all S&E research, by selected country: 2020

S&E = science and engineering.

Article counts refer to publications from a selection of conference proceedings and peer-reviewed journals in S&E fields from Scopus. Articles are assigned to a country, or economy on the basis of the institutional addresses of the authors listed in the article. Articles are credited on a whole count basis (i.e., for articles produced by authors from different countries, each country is credited for one article). The percents refer to the proportion of COVID-19 articles to feature collaboration or to the proportion of general articles across all fields to feature collaboration. Articles were excluded when one or more coauthored publications had incomplete address information in the Scopus database; therefore, they cannot be reliably identified as international or domestic collaborations. Data for all regions, countries, and economies are available in supplemental table SPBS-91 and supplemental table SPBS-35 in Publications Output: U.S. Trends and International Comparisons ( https://ncses.nsf.gov/pubs/nsb202333/table/SPBS-91 and https://ncses.nsf.gov/pubs/nsb202333/table/SPBS-35 ).

National Center for Science and Engineering Statistics; Science-Metrix; Elsevier, Scopus abstract and citation database, accessed April 2021.

Although each of the top producing countries had a lower rate of international collaborations in AI research than in S&E research, the results were mixed for COVID-19. As the number of articles in AI has increased, the rate of international collaboration also increased. For COVID-19 collaborations in 2020, only some of the top producing countries had lower rates of international collaboration in AI research than in all S&E research.

Data Sources, Limitations, and Availability

Publication data are derived from a large database of publication records that were developed for Science and Engineering Indicators 2024, Publications Output: U.S. Trends and International Comparisons (NSB-2023-33), from the Scopus database by Elsevier. The publication counts and coauthorship information presented are derived from information about research articles and conference papers (hereafter referred to collectively as articles) published in conference proceedings and peer-reviewed scientific and technical journals. Elsevier selects journals and conference proceedings for the Scopus database based on evaluation by an international group of subject-matter experts (see NSB-2023-33, Technical Appendix ), and the National Center for Science and Engineering Statistics (NCSES) undertakes additional filtering of the Scopus data to ensure that the statistics presented in Science and Engineering Indicators measure original and high-quality research publications (Science-Metrix 2023). Although the listed affiliation is generally reflective of the locations where research was conducted, authors may have honorary affiliations, have moved, or have experienced other circumstances preventing their affiliations from being an exact corollary to the research environment.

The subset of AI articles was determined by All Science Journal Classification subject matter classification. Global coronavirus publication output data for 2020 were extracted from two different sources. The COVID-19 Open Research Dataset (CORD-19) was created through a partnership between the Office of Science and Technology Policy, the Allen Institute for Artificial Intelligence, the Chan Zuckerberg Initiative, Microsoft Research, Kaggle, and the National Library of Medicine at the National Institutes of Health, coordinated by Georgetown University’s Center for Security and Emerging Technology. CORD-19 is a highly inclusive, noncurated database. The other coronavirus publication output data source was the Scopus database, which permits more refined analysis because it includes more fields (e.g., instructional country of each author). (See NSB-2021-4, Technical Appendix ).

1 See table SPBS-22 in National Science Board, National Science Foundation. 2023. Publications Output: U.S. Trends and International Comparisons. Science and Engineering Indicators 2024. NSB-2023-33. Available at https://ncses.nsf.gov/pubs/nsb202333 .

2 See NSB-2023-33, table SPBS-99 .

3 See NSB-2023-33, figure PBS-3 .

4 See NSB-2023-33, table SPBS-22 .

5 See NSB-2023-33, figure PBS-3 .

Carvalho DS, Felipe LL, Albuquerque PC, Zicker F, Fonseca BDP. 2023. Leadership and International Collaboration on COVID-19 Research: Reducing the North–South Divide? Scientometrics 128:4689–705. Available at https://doi.org/10.1007/s11192-023-04754-x .

Chinchilla-Rodríguez Z, Sugimoto CR, Larivière V. 2019. Follow the Leader: On the Relationship between Leadership and Scholarly Impact in International Collaborations. PLOS ONE 14:e0218309. Available at https://doi.org/10.1371/journal.pone.0218309 .

Gazni A, Sugimoto CR, Didegah F. 2012. Mapping World Scientific Collaboration: Authors, Institutions, and Countries. Journal of the American Society for Information Science and Technology 63:323–35. Available at https://doi.org/10.1002/asi.21688 .

Katz JS, Hicks D. 1997. How Much Is a Collaboration Worth? A Calibrated Bibliometric Model. Scientometrics 40:541–54. Available at https://doi.org/10.1007/BF02459299 .

Lee Y-N, Walsh JP, Wang J. 2015. Creativity in Scientific Teams: Unpacking Novelty and Impact. Research Policy 44:684–97. Available at https://doi.org/10.1016/j.respol.2014.10.007 .

Leydesdorff L, Wagner CS. 2008. International Collaboration in Science and the Formation of a Core Group. Journal of Informetrics 2:317–25. Available at https://doi.org/10.1016/j.joi.2008.07.003 .

Science-Metrix. 2023. Bibliometric Indicators for the Science and Engineering Indicators 2024. Technical Documentation . Available at https://science-metrix.com/bibliometrics-indicators-for-the-science-and-engineering-indicators-2024-technical-documentation/ . Accessed 26 August 2023.

Wagner CS, Brahmakulam IT, Jackson BA, Wong A, Yoda T. 2001. Science and Technology Collaboration : Building Capacity i n Developing Countries ? Santa Monica, CA: RAND Corporation. Available at https://www.rand.org/pubs/monograph_reports/MR1357z0.html .

Wuchty S, Jones BF, Uzzi B. 2007. The Increasing Dominance of Teams in Production of Knowledge. Science 316:1036. Available at https://doi.org/10.1126/science.1136099 .

Suggested Citation

Boothby C, Schneider B; National Center for Science and Engineering Statistics (NCSES). 2024. International Collaboration in Selected Critical and Emerging Fields: COVID-19 and Artificial Intelligence. NSF 24-323. Alexandria, VA: National Science Foundation. Available at https://ncses.nsf.gov/pubs/nsf24323 .

Report Authors

Clara Boothby ORISE Fellow NCSES E-mail: [email protected]

Benjamin Schneider Interdisciplinary Science Analyst NCSES Tel: 703.292.8828 E-mail: [email protected]

National Center for Science and Engineering Statistics Directorate for Social, Behavioral and Economic Sciences National Science Foundation 2415 Eisenhower Avenue, Suite W14200 Alexandria, VA 22314 Tel: (703) 292-8780 FIRS: (800) 877-8339 TDD: (800) 281-8749 E-mail: [email protected]

Source Data & Analysis

Related content, get e-mail updates from ncses.

NCSES is an official statistical agency. Subscribe below to receive our latest news and announcements.

The Budget Lab at Yale Launches to Provide Novel Analysis for Federal Policy Proposals

The Budget Lab logo on dark blue background

The  Budget Lab at Yale , a nonpartisan policy research center, launched on April 12 to provide in-depth analysis for federal policy proposals impacting the American economy. For too long, according to the center’s founders, policy analysis has been narrowly focused on short-term cost estimates, or traditional budget scores, according to the center’s founders. The Budget Lab aims to fill a critical gap in policy evaluation, particularly focusing on the long-term effects of proposed policies on the economy, the income distribution, and recipients. The Budget Lab’s initial analysis , released today, examines both the Tax Cut and Jobs Act (TCJA) and the Child Tax Credit (CTC) through this broader lens.  

The Budget Lab is co-founded by leading economic advisors and academics whose goal is to bring fresh ideas and new methods to policy making. 

  • Natasha Sarin, Co-founder and President, is a Professor of Law at Yale Law School with a secondary appointment at the Yale School of Management in the Finance Department. She served as Deputy Assistant Secretary for Economic Policy and later as a Counselor to the U.S. Treasury Secretary Janet Yellen. 
  • Danny Yagan, Co-founder and Chief Economist, is an Associate Professor of Economics at UC Berkeley and a Research Associate of the National Bureau of Economic Research. He was the Chief Economist of the White House Office of Management and Budget.
  • Martha Gimbel, Co-founder and Executive Director, is a former Senior Advisor at the White House Council of Economic Advisers, Senior Policy Advisor to the U.S Secretary of Labor, and Senior Economist and Research Director at Congress’s Joint Economic Committee. 

“For many of the greatest policy challenges of our time — investing in children, combating climate change — their most important impact is not on short-run GDP. We need to understand the effects on poverty, on emissions reduction, on the income distribution,” said Sarin. “We are excited to share the tools we have built to analyze the fiscal and social impacts of government policies so policymakers can make better choices.”

The Budget Lab’s work will look at issues not included in current budget policy assessment methods, particularly in evaluating the full scope of costs and returns related to policies including the child tax credit, tax cuts, paid family leave, deficit reduction, and universal pre-K. The Lab’s innovative approach bridges this gap by offering a combination of existing open-source models and our microsimulation tax model to provide fast, transparent, and innovative estimates that unlock deeper insights.

“Our approach implements a new lens to improve existing conventions for distributional impacts by showing how policies affect families over time,” added Yagan. 

One key aspect of the Budget Lab’s commitment to transparency is its open-access model code. The code used to produce analysis is publicly available, fostering trust and allowing policymakers to understand how the Budget Lab arrives at its results. It also allows for the infrastructure of the budget model the team is developing to be leveraged by others interested in similar analysis. 

“Our aim is to provide rapid responses to important policy questions with the ability to think not only about the costs of policies but also about benefits and the return on investments,” said Martha Gimbel.  “Our tax microsimulation model, budget estimates, and interactives will paint a broader and more realistic picture of how Americans will benefit from proposed government initiatives.”  

The Budget Lab is hosting a launch event at the National Press Club on April 12 where the leadership team will share new research on budget scoring for TCJA and CTC. The event will include remarks by Shalanda Young, Director of the Office of Management and Budget and a panel discussion with Joshua Bolten, former Director of the Office of Management and Budget and White House Chief of Staff for President George W. Bush; Doug Holtz-Eakin, former Director of Congressional Budget Office and economic policy advisor to Sen. John McCain; and will be moderated by Greg Ip of The Wall Street Journal .   

Budget Lab Team

In addition to the Budget Lab co-founders, the team includes leading economists who have extensive experience in the public sector. 

Ernie Tedeschi, Director of Economics, was most recently the chief economist at the White House Council of Economic Advisors. Rich Prisinzano, is the Director of Policy Analysis, previously served at the Penn Wharton Budget Model and for over a decade as an economist in the Office of Tax Analysis in the U.S. Department of Treasury. John Ricco, Associate Director of Policy Analysis, is an economic researcher with a decade of experience building microsimulation models to inform public policy debates and was formerly with the Penn Wharton Budget Model and also a research analyst at the International Monetary Fund. Harris Eppsteiner, Associate Director of Policy Analysis, was a Special Assistant to the Chairman and research economist at the White House Council on Economic Advisors. 

In the Press

Law school clinic’s discrimination case on behalf of black veterans proceeds, yale veterans clinic sues va over gender confirmation surgery, anti-dei complaints filed with eeoc carry no legal weight — a commentary by p. david lopez, transgender veterans sue v.a. over gender-affirming surgeries, related news.

Judge Thomas Griffith, Secretary Jeh Johnson, and Luke Bronin in conversation

Crossing Divides Welcomes Secretary Jeh Johnson and Judge Thomas B. Griffith

Crowd at the Liman Colloquium in 2017

“Vital Places”: Yale Law School’s Centers Enhance Intellectual Life

Seal of the VA

Clinic Lawsuit Challenges VA Denial of Gender-Affirming Surgery

  • Search Menu
  • Advance Articles
  • Special Issues
  • High-Impact Collection
  • Author Guidelines
  • Submission Site
  • Call for Papers
  • Open Access Options
  • Self-Archiving Policy
  • Why Publish with Us?
  • About Forensic Sciences Research
  • Journal Metrics
  • About the Academy of Forensic Science
  • Editorial Board
  • Advertising & Corporate Services

Article Contents

A comparative experimental study on the collection and analysis of dna samples from under fingernail material.

ORCID logo

  • Article contents
  • Figures & tables
  • Supplementary Data

Elif Yüksel, Sukriye Karadayı, Tulin Ozbek, Beytullah Karadayı, A comparative experimental study on the collection and analysis of DNA samples from under fingernail material, Forensic Sciences Research , 2024;, owae025, https://doi.org/10.1093/fsr/owae025

  • Permissions Icon Permissions

In cases of murder and rape where there is physical contact between the perpetrator and the victim, analysis of the victim’s nail material is quite valuable. Although it is possible that the foreign DNA detected in the fingernail material does not belong to the perpetrator of the incident, ıf it belongs to the perpetrator of the incident, it may provide useful findings for solve the incident. Fingernail material collected after the incident often contains resulting in mixed DNA. The efficiency of sample collection procedures is of particular importance, as this process may pose some problems in the interpretation of autosomal STR analyses used for the identification of the individual or individuals. The aim of this study is to compare 3 different fingernail material collection procedures (thick-tipped swabbing and thin-tipped swabbing and nail clipping) to determine the most efficient sample collection procedure and to contribute to routine investigations to identify the assailant in forensic cases. In our study, under fingernail material was collected from 12 volunteer couples by three different methods. To help comparing the efficiency of the three different methods, the profiles obtained were classified based on the number of female and men alleles detected. Obtained STR profiles, while nail clipping yielded 58.3% (n:7) ‘High level DNA mixture’ as a profile containing 12 or more than 12 female alleles, 75% (n:9) of the samples taken with cotton-toothpick swabs (thin-tipped) yielded ‘Full Male Profile’. In conclusion, our study shows that cotton toothpick swabs (thin-tipped) are the most efficient method for determining the male DNA profile among three different fingernail material collection procedures. We suggest that using thin-tipped swabs produced in a specific standard instead of the commonly used size swabs that are frequently used in routine crime investigations to identify perpetrator from fingernail material may improve efficiency of processing the nails and evaluation of the evidence.

Supplementary data

Email alerts, citing articles via.

  • Advertising and Corporate Services
  • Journals Career Network

Affiliations

  • Online ISSN 2471-1411
  • Print ISSN 2096-1790
  • Copyright © 2024 Academy of Forensic Science
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

IMAGES

  1. Content Analysis For Research

    research and analysis of content

  2. What is Content Analysis

    research and analysis of content

  3. CONTENT ANALYSIS

    research and analysis of content

  4. Content Analysis Research Method

    research and analysis of content

  5. The Illustrated Guide to the Content Analysis Research Project

    research and analysis of content

  6. What is the Difference Between Thematic and Content Analysis

    research and analysis of content

VIDEO

  1. Content Analysis Method || Content Analysis Method in hindi || Content Analysis Research Method

  2. How to do content analysis in Excel and the concept of content analysis ( Amharic tutorial)

  3. 68 Content Analysis Research Method for Consumer Behavior and Marketing

  4. Research Methodology : Qualitative Research (Content Analysis)

  5. HOW TO READ and ANALYZE A RESEARCH STUDY

  6. Quantitative Research Methods Week 7

COMMENTS

  1. Content Analysis

    Content analysis is a research method used to identify patterns in recorded communication. To conduct content analysis, you systematically collect data from a set of texts, which can be written, oral, or visual: Books, newspapers and magazines. Speeches and interviews. Web content and social media posts. Photographs and films.

  2. Content Analysis

    Content analysis is a research method used to analyze and interpret the characteristics of various forms of communication, such as text, images, or audio. It involves systematically analyzing the content of these materials, identifying patterns, themes, and other relevant features, and drawing inferences or conclusions based on the findings.

  3. Content Analysis Method and Examples

    Content analysis is a research tool used to determine the presence of certain words, themes, or concepts within some given qualitative data (i.e. text). Using content analysis, researchers can quantify and analyze the presence, meanings, and relationships of such certain words, themes, or concepts.

  4. Content Analysis

    Content analysis is a research method used to identify patterns in recorded communication. To conduct content analysis, you systematically collect data from a set of texts, which can be written, oral, or visual: Books, newspapers, and magazines; Speeches and interviews;

  5. A hands-on guide to doing content analysis

    Content analysis, as in all qualitative analysis, is a reflective process. There is no "step 1, 2, 3, done!" linear progression in the analysis. ... Graneheim U.H., Lundman B. Qualitative content analysis in nursing research: concepts, procedures, and measures to achieve trustworthiness. Nurse Educ Today. 2004; 24:105-112.

  6. What is Content Analysis? Uses, Types & Advantages

    Content analysis is a research method used to identify the presence of various concepts, words, and themes in different texts. Two types of content analysis exist: conceptual analysis and relational analysis. In the former, researchers determine whether and how frequently certain concepts appear in a text. In relational analysis, researchers ...

  7. Introduction

    Abstract. This chapter offers an inclusive definition of content analysis. This helps in clarifying some key terms and concepts. Three approaches to content analysis are introduced and defined briefly: basic content analysis, interpretive content analysis, and qualitative content analysis. Long-standing differences between quantitative and ...

  8. Qualitative Research and Content Analysis

    5 Qualitative Research and Content Analysis. Qualitative research is performed to study and understand phenomena in their natural contexts. As such, qualitative research focuses on—and respects—people's experiences and perspectives, neither of which can be described through objective measurements or numbers.

  9. How to do a content analysis [7 steps]

    In research, content analysis is the process of analyzing content and its features with the aim of identifying patterns and the presence of words, themes, and concepts within the content. Simply put, content analysis is a research method that aims to present the trends, patterns, concepts, and ideas in content as objective, quantitative or ...

  10. The Fundamentals of Content Analysis

    2.1 Content Analysis for Deductive Inference. If the purpose of the research project is to test or retest a theory in a new context, a deductive approach is appropriate. Content analysis for deductive inference is the more traditional form of content analysis in which observed or manifest content is analyzed (Potter and Levine-Donnerstein 1999). ...

  11. Chapter 17. Content Analysis

    Chapter 17. Content Analysis Introduction. Content analysis is a term that is used to mean both a method of data collection and a method of data analysis. Archival and historical works can be the source of content analysis, but so too can the contemporary media coverage of a story, blogs, comment posts, films, cartoons, advertisements, brand packaging, and photographs posted on Instagram or ...

  12. Three Approaches to Qualitative Content Analysis

    Content analysis is a widely used qualitative research technique. Rather than being a single method, current applications of content analysis show three distinct approaches: conventional, directed, or summative. All three approaches are used to interpret meaning from the content of text data and, hence, adhere to the naturalistic paradigm.

  13. Qualitative Content Analysis 101 (+ Examples)

    Content analysis is a qualitative analysis method that focuses on recorded human artefacts such as manuscripts, voice recordings and journals. Content analysis investigates these written, spoken and visual artefacts without explicitly extracting data from participants - this is called unobtrusive research. In other words, with content ...

  14. (PDF) Content Analysis: A Flexible Methodology

    Abstract. Content analysis is a highly fl exible research method that has been. widely used in library and infor mation science (LIS) studies with. varying research goals and objectives. The ...

  15. Reflexive Content Analysis: An Approach to Qualitative Data Analysis

    Secondly, the level of interpretation required by qualitive content analysis methods is often ambiguous. Qualitative content analysis has generally been seen as a method for the systematic reduction and description of textual data with the aim of identifying meaningful patterns (Cavanagh, 1997; Cho & Lee, 2014; Elo & Kyngäs, 2008; Erlingsson & Brysiewicz, 2017; Hsieh & Shannon, 2005; Mayring ...

  16. Content Analysis

    Step 1—Define the research questions. Ground the focus of the analysis of content in the literature/theory informing the data collection efforts. Consider the purpose of both the research in general as well as the information required to proceed with the next steps of the research. Step 2—Define the population.

  17. Demystifying Content Analysis

    Quantitative content analysis is always describing a positivist manifest content analysis, in that the nature of truth is believed to be objective, observable, and measurable. Qualitative research, which favors the researcher's interpretation of an individual's experience, may also be used to analyze manifest content.

  18. The Practical Guide to Qualitative Content Analysis

    Qualitative content analysis is a research method used to analyze and interpret the content of textual data, such as written documents, interview transcripts, or other forms of communication. This guide introduces qualitative content analysis, explains the different types of qualitative content analysis, and provides a step-by-step guide for ...

  19. 15.1. Content Analysis

    A content analysis of research studies would use its sources to pursue very different questions. In short, it would be a "study of the studies" as opposed to a "review of studies." For example, a content analysis of scholarly literature might ask whether the top-ranking journals in a particular discipline disproportionately publish work ...

  20. (PDF) Content Analysis

    Content analysis is a widely used qualitative research technique. Rather than being a single method, current applications of content analysis show three distinct approaches: conventional, directed ...

  21. Curated Resources for Research Design and Analysis

    Selected training content on a variety of topics in research design and analysis. Selected training content on a variety of topics in research design and analysis. ... BERD does not take responsibility for any misuse or misinterpretation of the curated content.

  22. Revealed: the ten research papers that policy documents cite most

    Economics papers dominate the top ten papers that policy documents reference most. Title. Journal. Year. The impact of trade on intra-industry reallocations and aggregate industry productivity ...

  23. How to close the Black tech talent gap

    The Black technology workforce. Black people make up 12 percent of the US workforce but only 8 percent of employees in tech jobs. 1 That percentage is even smaller further up the corporate ladder; just 3 percent of technology executives in the C-suite are Black, according to a McKinsey analysis of Fortune 500 executives. 2 That gap is likely to ...

  24. Changing Partisan Coalitions in a Politically Divided Nation

    About Pew Research Center Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions.

  25. International Collaboration in Selected Critical and Emerging Fields

    Artificial intelligence (AI) and COVID-19 research are two areas that have complex challenges that both domestic and international institutions are motivated to overcome. A concentration on domestic research can indicate the presence of sufficient domestic knowledge and resources or an interest in preserving in-house expertise. This InfoBrief examines the extent to which top producers of ...

  26. The Budget Lab at Yale Launches to Provide Novel Analysis for Federal

    The Budget Lab at Yale, a nonpartisan policy research center, launched on April 12 to provide in-depth analysis for federal policy proposals impacting the American economy.For too long, according to the center's founders, policy analysis has been narrowly focused on short-term cost estimates, or traditional budget scores, according to the center's founders.

  27. Citizens protein project: A self-funded, transparent, and... : Medicine

    2.2. Protocols for blinded product analyses. All products were subjected to the analysis of total protein content (Kjeldahl method), [] detection and quantification of fungal aflatoxin (high-pressure liquid chromatography and fluorescence detection), [] pesticide residue detection and estimation (gas and liquid chromatography and tandem mass spectrometry), [] detection and quantification of ...

  28. A hands-on guide to doing content analysis

    A common starting point for qualitative content analysis is often transcribed interview texts. The objective in qualitative content analysis is to systematically transform a large amount of text into a highly organised and concise summary of key results. Analysis of the raw data from verbatim transcribed interviews to form categories or themes ...

  29. comparative experimental study on the collection and analysis of DNA

    In cases of murder and rape where there is physical contact between the perpetrator and the victim, analysis of the victim's nail material is quite valuable. Although it is possible that the foreign DNA detected in the fingernail material does not belong to the perpetrator of the incident, ıf it belongs to the perpetrator of the incident, it ...