data collection in case study research

The Ultimate Guide to Qualitative Research - Part 1: The Basics

Introduction and overview
What is qualitative research?
What is qualitative data?
Examples of qualitative data
Qualitative vs. quantitative research
Mixed methods
Qualitative research preparation
Theoretical perspective
Theoretical framework
Literature reviews

Research question

Conceptual framework
Conceptual vs. theoretical framework

Data collection

Qualitative research methods
Focus groups
Observational research

What is a case study?

Applications for case study research, what is a good case study, process of case study design, benefits and limitations of case studies.

Ethnographical research
Ethical considerations
Confidentiality and privacy
Power dynamics
Reflexivity

Case studies

Case studies are essential to qualitative research , offering a lens through which researchers can investigate complex phenomena within their real-life contexts. This chapter explores the concept, purpose, applications, examples, and types of case studies and provides guidance on how to conduct case study research effectively.

Whereas quantitative methods look at phenomena at scale, case study research looks at a concept or phenomenon in considerable detail. While analyzing a single case can help understand one perspective regarding the object of research inquiry, analyzing multiple cases can help obtain a more holistic sense of the topic or issue. Let's provide a basic definition of a case study, then explore its characteristics and role in the qualitative research process.

Definition of a case study

A case study in qualitative research is a strategy of inquiry that involves an in-depth investigation of a phenomenon within its real-world context. It provides researchers with the opportunity to acquire an in-depth understanding of intricate details that might not be as apparent or accessible through other methods of research. The specific case or cases being studied can be a single person, group, or organization – demarcating what constitutes a relevant case worth studying depends on the researcher and their research question .

Among qualitative research methods , a case study relies on multiple sources of evidence, such as documents, artifacts, interviews , or observations , to present a complete and nuanced understanding of the phenomenon under investigation. The objective is to illuminate the readers' understanding of the phenomenon beyond its abstract statistical or theoretical explanations.

Characteristics of case studies

Case studies typically possess a number of distinct characteristics that set them apart from other research methods. These characteristics include a focus on holistic description and explanation, flexibility in the design and data collection methods, reliance on multiple sources of evidence, and emphasis on the context in which the phenomenon occurs.

Furthermore, case studies can often involve a longitudinal examination of the case, meaning they study the case over a period of time. These characteristics allow case studies to yield comprehensive, in-depth, and richly contextualized insights about the phenomenon of interest.

The role of case studies in research

Case studies hold a unique position in the broader landscape of research methods aimed at theory development. They are instrumental when the primary research interest is to gain an intensive, detailed understanding of a phenomenon in its real-life context.

In addition, case studies can serve different purposes within research - they can be used for exploratory, descriptive, or explanatory purposes, depending on the research question and objectives. This flexibility and depth make case studies a valuable tool in the toolkit of qualitative researchers.

Remember, a well-conducted case study can offer a rich, insightful contribution to both academic and practical knowledge through theory development or theory verification, thus enhancing our understanding of complex phenomena in their real-world contexts.

What is the purpose of a case study?

Case study research aims for a more comprehensive understanding of phenomena, requiring various research methods to gather information for qualitative analysis . Ultimately, a case study can allow the researcher to gain insight into a particular object of inquiry and develop a theoretical framework relevant to the research inquiry.

Why use case studies in qualitative research?

Using case studies as a research strategy depends mainly on the nature of the research question and the researcher's access to the data.

Conducting case study research provides a level of detail and contextual richness that other research methods might not offer. They are beneficial when there's a need to understand complex social phenomena within their natural contexts.

The explanatory, exploratory, and descriptive roles of case studies

Case studies can take on various roles depending on the research objectives. They can be exploratory when the research aims to discover new phenomena or define new research questions; they are descriptive when the objective is to depict a phenomenon within its context in a detailed manner; and they can be explanatory if the goal is to understand specific relationships within the studied context. Thus, the versatility of case studies allows researchers to approach their topic from different angles, offering multiple ways to uncover and interpret the data .

The impact of case studies on knowledge development

Case studies play a significant role in knowledge development across various disciplines. Analysis of cases provides an avenue for researchers to explore phenomena within their context based on the collected data.

This can result in the production of rich, practical insights that can be instrumental in both theory-building and practice. Case studies allow researchers to delve into the intricacies and complexities of real-life situations, uncovering insights that might otherwise remain hidden.

Types of case studies

In qualitative research , a case study is not a one-size-fits-all approach. Depending on the nature of the research question and the specific objectives of the study, researchers might choose to use different types of case studies. These types differ in their focus, methodology, and the level of detail they provide about the phenomenon under investigation.

Understanding these types is crucial for selecting the most appropriate approach for your research project and effectively achieving your research goals. Let's briefly look at the main types of case studies.

Exploratory case studies

Exploratory case studies are typically conducted to develop a theory or framework around an understudied phenomenon. They can also serve as a precursor to a larger-scale research project. Exploratory case studies are useful when a researcher wants to identify the key issues or questions which can spur more extensive study or be used to develop propositions for further research. These case studies are characterized by flexibility, allowing researchers to explore various aspects of a phenomenon as they emerge, which can also form the foundation for subsequent studies.

Descriptive case studies

Descriptive case studies aim to provide a complete and accurate representation of a phenomenon or event within its context. These case studies are often based on an established theoretical framework, which guides how data is collected and analyzed. The researcher is concerned with describing the phenomenon in detail, as it occurs naturally, without trying to influence or manipulate it.

Explanatory case studies

Explanatory case studies are focused on explanation - they seek to clarify how or why certain phenomena occur. Often used in complex, real-life situations, they can be particularly valuable in clarifying causal relationships among concepts and understanding the interplay between different factors within a specific context.

Intrinsic, instrumental, and collective case studies

These three categories of case studies focus on the nature and purpose of the study. An intrinsic case study is conducted when a researcher has an inherent interest in the case itself. Instrumental case studies are employed when the case is used to provide insight into a particular issue or phenomenon. A collective case study, on the other hand, involves studying multiple cases simultaneously to investigate some general phenomena.

Each type of case study serves a different purpose and has its own strengths and challenges. The selection of the type should be guided by the research question and objectives, as well as the context and constraints of the research.

The flexibility, depth, and contextual richness offered by case studies make this approach an excellent research method for various fields of study. They enable researchers to investigate real-world phenomena within their specific contexts, capturing nuances that other research methods might miss. Across numerous fields, case studies provide valuable insights into complex issues.

Critical information systems research

Case studies provide a detailed understanding of the role and impact of information systems in different contexts. They offer a platform to explore how information systems are designed, implemented, and used and how they interact with various social, economic, and political factors. Case studies in this field often focus on examining the intricate relationship between technology, organizational processes, and user behavior, helping to uncover insights that can inform better system design and implementation.

Health research

Health research is another field where case studies are highly valuable. They offer a way to explore patient experiences, healthcare delivery processes, and the impact of various interventions in a real-world context.

Case studies can provide a deep understanding of a patient's journey, giving insights into the intricacies of disease progression, treatment effects, and the psychosocial aspects of health and illness.

Asthma research studies

Specifically within medical research, studies on asthma often employ case studies to explore the individual and environmental factors that influence asthma development, management, and outcomes. A case study can provide rich, detailed data about individual patients' experiences, from the triggers and symptoms they experience to the effectiveness of various management strategies. This can be crucial for developing patient-centered asthma care approaches.

Other fields

Apart from the fields mentioned, case studies are also extensively used in business and management research, education research, and political sciences, among many others. They provide an opportunity to delve into the intricacies of real-world situations, allowing for a comprehensive understanding of various phenomena.

Case studies, with their depth and contextual focus, offer unique insights across these varied fields. They allow researchers to illuminate the complexities of real-life situations, contributing to both theory and practice.

Whatever field you're in, ATLAS.ti puts your data to work for you

Download a free trial of ATLAS.ti to turn your data into insights.

Understanding the key elements of case study design is crucial for conducting rigorous and impactful case study research. A well-structured design guides the researcher through the process, ensuring that the study is methodologically sound and its findings are reliable and valid. The main elements of case study design include the research question , propositions, units of analysis, and the logic linking the data to the propositions.

The research question is the foundation of any research study. A good research question guides the direction of the study and informs the selection of the case, the methods of collecting data, and the analysis techniques. A well-formulated research question in case study research is typically clear, focused, and complex enough to merit further detailed examination of the relevant case(s).

Propositions

Propositions, though not necessary in every case study, provide a direction by stating what we might expect to find in the data collected. They guide how data is collected and analyzed by helping researchers focus on specific aspects of the case. They are particularly important in explanatory case studies, which seek to understand the relationships among concepts within the studied phenomenon.

Units of analysis

The unit of analysis refers to the case, or the main entity or entities that are being analyzed in the study. In case study research, the unit of analysis can be an individual, a group, an organization, a decision, an event, or even a time period. It's crucial to clearly define the unit of analysis, as it shapes the qualitative data analysis process by allowing the researcher to analyze a particular case and synthesize analysis across multiple case studies to draw conclusions.

Argumentation

This refers to the inferential model that allows researchers to draw conclusions from the data. The researcher needs to ensure that there is a clear link between the data, the propositions (if any), and the conclusions drawn. This argumentation is what enables the researcher to make valid and credible inferences about the phenomenon under study.

Understanding and carefully considering these elements in the design phase of a case study can significantly enhance the quality of the research. It can help ensure that the study is methodologically sound and its findings contribute meaningful insights about the case.

Ready to jumpstart your research with ATLAS.ti?

Conceptualize your research project with our intuitive data analysis interface. Download a free trial today.

Conducting a case study involves several steps, from defining the research question and selecting the case to collecting and analyzing data . This section outlines these key stages, providing a practical guide on how to conduct case study research.

Defining the research question

The first step in case study research is defining a clear, focused research question. This question should guide the entire research process, from case selection to analysis. It's crucial to ensure that the research question is suitable for a case study approach. Typically, such questions are exploratory or descriptive in nature and focus on understanding a phenomenon within its real-life context.

Selecting and defining the case

The selection of the case should be based on the research question and the objectives of the study. It involves choosing a unique example or a set of examples that provide rich, in-depth data about the phenomenon under investigation. After selecting the case, it's crucial to define it clearly, setting the boundaries of the case, including the time period and the specific context.

Previous research can help guide the case study design. When considering a case study, an example of a case could be taken from previous case study research and used to define cases in a new research inquiry. Considering recently published examples can help understand how to select and define cases effectively.

Developing a detailed case study protocol

A case study protocol outlines the procedures and general rules to be followed during the case study. This includes the data collection methods to be used, the sources of data, and the procedures for analysis. Having a detailed case study protocol ensures consistency and reliability in the study.

The protocol should also consider how to work with the people involved in the research context to grant the research team access to collecting data. As mentioned in previous sections of this guide, establishing rapport is an essential component of qualitative research as it shapes the overall potential for collecting and analyzing data.

Collecting data

Gathering data in case study research often involves multiple sources of evidence, including documents, archival records, interviews, observations, and physical artifacts. This allows for a comprehensive understanding of the case. The process for gathering data should be systematic and carefully documented to ensure the reliability and validity of the study.

Analyzing and interpreting data

The next step is analyzing the data. This involves organizing the data , categorizing it into themes or patterns , and interpreting these patterns to answer the research question. The analysis might also involve comparing the findings with prior research or theoretical propositions.

Writing the case study report

The final step is writing the case study report . This should provide a detailed description of the case, the data, the analysis process, and the findings. The report should be clear, organized, and carefully written to ensure that the reader can understand the case and the conclusions drawn from it.

Each of these steps is crucial in ensuring that the case study research is rigorous, reliable, and provides valuable insights about the case.

The type, depth, and quality of data in your study can significantly influence the validity and utility of the study. In case study research, data is usually collected from multiple sources to provide a comprehensive and nuanced understanding of the case. This section will outline the various methods of collecting data used in case study research and discuss considerations for ensuring the quality of the data.

Interviews are a common method of gathering data in case study research. They can provide rich, in-depth data about the perspectives, experiences, and interpretations of the individuals involved in the case. Interviews can be structured , semi-structured , or unstructured , depending on the research question and the degree of flexibility needed.

Observations

Observations involve the researcher observing the case in its natural setting, providing first-hand information about the case and its context. Observations can provide data that might not be revealed in interviews or documents, such as non-verbal cues or contextual information.

Documents and artifacts

Documents and archival records provide a valuable source of data in case study research. They can include reports, letters, memos, meeting minutes, email correspondence, and various public and private documents related to the case.

These records can provide historical context, corroborate evidence from other sources, and offer insights into the case that might not be apparent from interviews or observations.

Physical artifacts refer to any physical evidence related to the case, such as tools, products, or physical environments. These artifacts can provide tangible insights into the case, complementing the data gathered from other sources.

Ensuring the quality of data collection

Determining the quality of data in case study research requires careful planning and execution. It's crucial to ensure that the data is reliable, accurate, and relevant to the research question. This involves selecting appropriate methods of collecting data, properly training interviewers or observers, and systematically recording and storing the data. It also includes considering ethical issues related to collecting and handling data, such as obtaining informed consent and ensuring the privacy and confidentiality of the participants.

Data analysis

Analyzing case study research involves making sense of the rich, detailed data to answer the research question. This process can be challenging due to the volume and complexity of case study data. However, a systematic and rigorous approach to analysis can ensure that the findings are credible and meaningful. This section outlines the main steps and considerations in analyzing data in case study research.

Organizing the data

The first step in the analysis is organizing the data. This involves sorting the data into manageable sections, often according to the data source or the theme. This step can also involve transcribing interviews, digitizing physical artifacts, or organizing observational data.

Categorizing and coding the data

Once the data is organized, the next step is to categorize or code the data. This involves identifying common themes, patterns, or concepts in the data and assigning codes to relevant data segments. Coding can be done manually or with the help of software tools, and in either case, qualitative analysis software can greatly facilitate the entire coding process. Coding helps to reduce the data to a set of themes or categories that can be more easily analyzed.

Identifying patterns and themes

After coding the data, the researcher looks for patterns or themes in the coded data. This involves comparing and contrasting the codes and looking for relationships or patterns among them. The identified patterns and themes should help answer the research question.

Interpreting the data

Once patterns and themes have been identified, the next step is to interpret these findings. This involves explaining what the patterns or themes mean in the context of the research question and the case. This interpretation should be grounded in the data, but it can also involve drawing on theoretical concepts or prior research.

Verification of the data

The last step in the analysis is verification. This involves checking the accuracy and consistency of the analysis process and confirming that the findings are supported by the data. This can involve re-checking the original data, checking the consistency of codes, or seeking feedback from research participants or peers.

Like any research method , case study research has its strengths and limitations. Researchers must be aware of these, as they can influence the design, conduct, and interpretation of the study.

Understanding the strengths and limitations of case study research can also guide researchers in deciding whether this approach is suitable for their research question . This section outlines some of the key strengths and limitations of case study research.

Benefits include the following:

Rich, detailed data: One of the main strengths of case study research is that it can generate rich, detailed data about the case. This can provide a deep understanding of the case and its context, which can be valuable in exploring complex phenomena.
Flexibility: Case study research is flexible in terms of design , data collection , and analysis . A sufficient degree of flexibility allows the researcher to adapt the study according to the case and the emerging findings.
Real-world context: Case study research involves studying the case in its real-world context, which can provide valuable insights into the interplay between the case and its context.
Multiple sources of evidence: Case study research often involves collecting data from multiple sources , which can enhance the robustness and validity of the findings.

On the other hand, researchers should consider the following limitations:

Generalizability: A common criticism of case study research is that its findings might not be generalizable to other cases due to the specificity and uniqueness of each case.
Time and resource intensive: Case study research can be time and resource intensive due to the depth of the investigation and the amount of collected data.
Complexity of analysis: The rich, detailed data generated in case study research can make analyzing the data challenging.
Subjectivity: Given the nature of case study research, there may be a higher degree of subjectivity in interpreting the data , so researchers need to reflect on this and transparently convey to audiences how the research was conducted.

Being aware of these strengths and limitations can help researchers design and conduct case study research effectively and interpret and report the findings appropriately.

Ready to analyze your data with ATLAS.ti?

See how our intuitive software can draw key insights from your data with a free trial today.

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
Can J Hosp Pharm
v.68(3); May-Jun 2015

Qualitative Research: Data Collection, Analysis, and Management

Introduction.

In an earlier paper, 1 we presented an introduction to using qualitative research methods in pharmacy practice. In this article, we review some principles of the collection, analysis, and management of qualitative data to help pharmacists interested in doing research in their practice to continue their learning in this area. Qualitative research can help researchers to access the thoughts and feelings of research participants, which can enable development of an understanding of the meaning that people ascribe to their experiences. Whereas quantitative research methods can be used to determine how many people undertake particular behaviours, qualitative methods can help researchers to understand how and why such behaviours take place. Within the context of pharmacy practice research, qualitative approaches have been used to examine a diverse array of topics, including the perceptions of key stakeholders regarding prescribing by pharmacists and the postgraduation employment experiences of young pharmacists (see “Further Reading” section at the end of this article).

In the previous paper, 1 we outlined 3 commonly used methodologies: ethnography 2 , grounded theory 3 , and phenomenology. 4 Briefly, ethnography involves researchers using direct observation to study participants in their “real life” environment, sometimes over extended periods. Grounded theory and its later modified versions (e.g., Strauss and Corbin 5 ) use face-to-face interviews and interactions such as focus groups to explore a particular research phenomenon and may help in clarifying a less-well-understood problem, situation, or context. Phenomenology shares some features with grounded theory (such as an exploration of participants’ behaviour) and uses similar techniques to collect data, but it focuses on understanding how human beings experience their world. It gives researchers the opportunity to put themselves in another person’s shoes and to understand the subjective experiences of participants. 6 Some researchers use qualitative methodologies but adopt a different standpoint, and an example of this appears in the work of Thurston and others, 7 discussed later in this paper.

Qualitative work requires reflection on the part of researchers, both before and during the research process, as a way of providing context and understanding for readers. When being reflexive, researchers should not try to simply ignore or avoid their own biases (as this would likely be impossible); instead, reflexivity requires researchers to reflect upon and clearly articulate their position and subjectivities (world view, perspectives, biases), so that readers can better understand the filters through which questions were asked, data were gathered and analyzed, and findings were reported. From this perspective, bias and subjectivity are not inherently negative but they are unavoidable; as a result, it is best that they be articulated up-front in a manner that is clear and coherent for readers.

THE PARTICIPANT’S VIEWPOINT

What qualitative study seeks to convey is why people have thoughts and feelings that might affect the way they behave. Such study may occur in any number of contexts, but here, we focus on pharmacy practice and the way people behave with regard to medicines use (e.g., to understand patients’ reasons for nonadherence with medication therapy or to explore physicians’ resistance to pharmacists’ clinical suggestions). As we suggested in our earlier article, 1 an important point about qualitative research is that there is no attempt to generalize the findings to a wider population. Qualitative research is used to gain insights into people’s feelings and thoughts, which may provide the basis for a future stand-alone qualitative study or may help researchers to map out survey instruments for use in a quantitative study. It is also possible to use different types of research in the same study, an approach known as “mixed methods” research, and further reading on this topic may be found at the end of this paper.

The role of the researcher in qualitative research is to attempt to access the thoughts and feelings of study participants. This is not an easy task, as it involves asking people to talk about things that may be very personal to them. Sometimes the experiences being explored are fresh in the participant’s mind, whereas on other occasions reliving past experiences may be difficult. However the data are being collected, a primary responsibility of the researcher is to safeguard participants and their data. Mechanisms for such safeguarding must be clearly articulated to participants and must be approved by a relevant research ethics review board before the research begins. Researchers and practitioners new to qualitative research should seek advice from an experienced qualitative researcher before embarking on their project.

DATA COLLECTION

Whatever philosophical standpoint the researcher is taking and whatever the data collection method (e.g., focus group, one-to-one interviews), the process will involve the generation of large amounts of data. In addition to the variety of study methodologies available, there are also different ways of making a record of what is said and done during an interview or focus group, such as taking handwritten notes or video-recording. If the researcher is audio- or video-recording data collection, then the recordings must be transcribed verbatim before data analysis can begin. As a rough guide, it can take an experienced researcher/transcriber 8 hours to transcribe one 45-minute audio-recorded interview, a process than will generate 20–30 pages of written dialogue.

Many researchers will also maintain a folder of “field notes” to complement audio-taped interviews. Field notes allow the researcher to maintain and comment upon impressions, environmental contexts, behaviours, and nonverbal cues that may not be adequately captured through the audio-recording; they are typically handwritten in a small notebook at the same time the interview takes place. Field notes can provide important context to the interpretation of audio-taped data and can help remind the researcher of situational factors that may be important during data analysis. Such notes need not be formal, but they should be maintained and secured in a similar manner to audio tapes and transcripts, as they contain sensitive information and are relevant to the research. For more information about collecting qualitative data, please see the “Further Reading” section at the end of this paper.

DATA ANALYSIS AND MANAGEMENT

If, as suggested earlier, doing qualitative research is about putting oneself in another person’s shoes and seeing the world from that person’s perspective, the most important part of data analysis and management is to be true to the participants. It is their voices that the researcher is trying to hear, so that they can be interpreted and reported on for others to read and learn from. To illustrate this point, consider the anonymized transcript excerpt presented in Appendix 1 , which is taken from a research interview conducted by one of the authors (J.S.). We refer to this excerpt throughout the remainder of this paper to illustrate how data can be managed, analyzed, and presented.

Interpretation of Data

Interpretation of the data will depend on the theoretical standpoint taken by researchers. For example, the title of the research report by Thurston and others, 7 “Discordant indigenous and provider frames explain challenges in improving access to arthritis care: a qualitative study using constructivist grounded theory,” indicates at least 2 theoretical standpoints. The first is the culture of the indigenous population of Canada and the place of this population in society, and the second is the social constructivist theory used in the constructivist grounded theory method. With regard to the first standpoint, it can be surmised that, to have decided to conduct the research, the researchers must have felt that there was anecdotal evidence of differences in access to arthritis care for patients from indigenous and non-indigenous backgrounds. With regard to the second standpoint, it can be surmised that the researchers used social constructivist theory because it assumes that behaviour is socially constructed; in other words, people do things because of the expectations of those in their personal world or in the wider society in which they live. (Please see the “Further Reading” section for resources providing more information about social constructivist theory and reflexivity.) Thus, these 2 standpoints (and there may have been others relevant to the research of Thurston and others 7 ) will have affected the way in which these researchers interpreted the experiences of the indigenous population participants and those providing their care. Another standpoint is feminist standpoint theory which, among other things, focuses on marginalized groups in society. Such theories are helpful to researchers, as they enable us to think about things from a different perspective. Being aware of the standpoints you are taking in your own research is one of the foundations of qualitative work. Without such awareness, it is easy to slip into interpreting other people’s narratives from your own viewpoint, rather than that of the participants.

To analyze the example in Appendix 1 , we will adopt a phenomenological approach because we want to understand how the participant experienced the illness and we want to try to see the experience from that person’s perspective. It is important for the researcher to reflect upon and articulate his or her starting point for such analysis; for example, in the example, the coder could reflect upon her own experience as a female of a majority ethnocultural group who has lived within middle class and upper middle class settings. This personal history therefore forms the filter through which the data will be examined. This filter does not diminish the quality or significance of the analysis, since every researcher has his or her own filters; however, by explicitly stating and acknowledging what these filters are, the researcher makes it easer for readers to contextualize the work.

Transcribing and Checking

For the purposes of this paper it is assumed that interviews or focus groups have been audio-recorded. As mentioned above, transcribing is an arduous process, even for the most experienced transcribers, but it must be done to convert the spoken word to the written word to facilitate analysis. For anyone new to conducting qualitative research, it is beneficial to transcribe at least one interview and one focus group. It is only by doing this that researchers realize how difficult the task is, and this realization affects their expectations when asking others to transcribe. If the research project has sufficient funding, then a professional transcriber can be hired to do the work. If this is the case, then it is a good idea to sit down with the transcriber, if possible, and talk through the research and what the participants were talking about. This background knowledge for the transcriber is especially important in research in which people are using jargon or medical terms (as in pharmacy practice). Involving your transcriber in this way makes the work both easier and more rewarding, as he or she will feel part of the team. Transcription editing software is also available, but it is expensive. For example, ELAN (more formally known as EUDICO Linguistic Annotator, developed at the Technical University of Berlin) 8 is a tool that can help keep data organized by linking media and data files (particularly valuable if, for example, video-taping of interviews is complemented by transcriptions). It can also be helpful in searching complex data sets. Products such as ELAN do not actually automatically transcribe interviews or complete analyses, and they do require some time and effort to learn; nonetheless, for some research applications, it may be a valuable to consider such software tools.

All audio recordings should be transcribed verbatim, regardless of how intelligible the transcript may be when it is read back. Lines of text should be numbered. Once the transcription is complete, the researcher should read it while listening to the recording and do the following: correct any spelling or other errors; anonymize the transcript so that the participant cannot be identified from anything that is said (e.g., names, places, significant events); insert notations for pauses, laughter, looks of discomfort; insert any punctuation, such as commas and full stops (periods) (see Appendix 1 for examples of inserted punctuation), and include any other contextual information that might have affected the participant (e.g., temperature or comfort of the room).

Dealing with the transcription of a focus group is slightly more difficult, as multiple voices are involved. One way of transcribing such data is to “tag” each voice (e.g., Voice A, Voice B). In addition, the focus group will usually have 2 facilitators, whose respective roles will help in making sense of the data. While one facilitator guides participants through the topic, the other can make notes about context and group dynamics. More information about group dynamics and focus groups can be found in resources listed in the “Further Reading” section.

Reading between the Lines

During the process outlined above, the researcher can begin to get a feel for the participant’s experience of the phenomenon in question and can start to think about things that could be pursued in subsequent interviews or focus groups (if appropriate). In this way, one participant’s narrative informs the next, and the researcher can continue to interview until nothing new is being heard or, as it says in the text books, “saturation is reached”. While continuing with the processes of coding and theming (described in the next 2 sections), it is important to consider not just what the person is saying but also what they are not saying. For example, is a lengthy pause an indication that the participant is finding the subject difficult, or is the person simply deciding what to say? The aim of the whole process from data collection to presentation is to tell the participants’ stories using exemplars from their own narratives, thus grounding the research findings in the participants’ lived experiences.

Smith 9 suggested a qualitative research method known as interpretative phenomenological analysis, which has 2 basic tenets: first, that it is rooted in phenomenology, attempting to understand the meaning that individuals ascribe to their lived experiences, and second, that the researcher must attempt to interpret this meaning in the context of the research. That the researcher has some knowledge and expertise in the subject of the research means that he or she can have considerable scope in interpreting the participant’s experiences. Larkin and others 10 discussed the importance of not just providing a description of what participants say. Rather, interpretative phenomenological analysis is about getting underneath what a person is saying to try to truly understand the world from his or her perspective.

Once all of the research interviews have been transcribed and checked, it is time to begin coding. Field notes compiled during an interview can be a useful complementary source of information to facilitate this process, as the gap in time between an interview, transcribing, and coding can result in memory bias regarding nonverbal or environmental context issues that may affect interpretation of data.

Coding refers to the identification of topics, issues, similarities, and differences that are revealed through the participants’ narratives and interpreted by the researcher. This process enables the researcher to begin to understand the world from each participant’s perspective. Coding can be done by hand on a hard copy of the transcript, by making notes in the margin or by highlighting and naming sections of text. More commonly, researchers use qualitative research software (e.g., NVivo, QSR International Pty Ltd; www.qsrinternational.com/products_nvivo.aspx ) to help manage their transcriptions. It is advised that researchers undertake a formal course in the use of such software or seek supervision from a researcher experienced in these tools.

Returning to Appendix 1 and reading from lines 8–11, a code for this section might be “diagnosis of mental health condition”, but this would just be a description of what the participant is talking about at that point. If we read a little more deeply, we can ask ourselves how the participant might have come to feel that the doctor assumed he or she was aware of the diagnosis or indeed that they had only just been told the diagnosis. There are a number of pauses in the narrative that might suggest the participant is finding it difficult to recall that experience. Later in the text, the participant says “nobody asked me any questions about my life” (line 19). This could be coded simply as “health care professionals’ consultation skills”, but that would not reflect how the participant must have felt never to be asked anything about his or her personal life, about the participant as a human being. At the end of this excerpt, the participant just trails off, recalling that no-one showed any interest, which makes for very moving reading. For practitioners in pharmacy, it might also be pertinent to explore the participant’s experience of akathisia and why this was left untreated for 20 years.

One of the questions that arises about qualitative research relates to the reliability of the interpretation and representation of the participants’ narratives. There are no statistical tests that can be used to check reliability and validity as there are in quantitative research. However, work by Lincoln and Guba 11 suggests that there are other ways to “establish confidence in the ‘truth’ of the findings” (p. 218). They call this confidence “trustworthiness” and suggest that there are 4 criteria of trustworthiness: credibility (confidence in the “truth” of the findings), transferability (showing that the findings have applicability in other contexts), dependability (showing that the findings are consistent and could be repeated), and confirmability (the extent to which the findings of a study are shaped by the respondents and not researcher bias, motivation, or interest).

One way of establishing the “credibility” of the coding is to ask another researcher to code the same transcript and then to discuss any similarities and differences in the 2 resulting sets of codes. This simple act can result in revisions to the codes and can help to clarify and confirm the research findings.

Theming refers to the drawing together of codes from one or more transcripts to present the findings of qualitative research in a coherent and meaningful way. For example, there may be examples across participants’ narratives of the way in which they were treated in hospital, such as “not being listened to” or “lack of interest in personal experiences” (see Appendix 1 ). These may be drawn together as a theme running through the narratives that could be named “the patient’s experience of hospital care”. The importance of going through this process is that at its conclusion, it will be possible to present the data from the interviews using quotations from the individual transcripts to illustrate the source of the researchers’ interpretations. Thus, when the findings are organized for presentation, each theme can become the heading of a section in the report or presentation. Underneath each theme will be the codes, examples from the transcripts, and the researcher’s own interpretation of what the themes mean. Implications for real life (e.g., the treatment of people with chronic mental health problems) should also be given.

DATA SYNTHESIS

In this final section of this paper, we describe some ways of drawing together or “synthesizing” research findings to represent, as faithfully as possible, the meaning that participants ascribe to their life experiences. This synthesis is the aim of the final stage of qualitative research. For most readers, the synthesis of data presented by the researcher is of crucial significance—this is usually where “the story” of the participants can be distilled, summarized, and told in a manner that is both respectful to those participants and meaningful to readers. There are a number of ways in which researchers can synthesize and present their findings, but any conclusions drawn by the researchers must be supported by direct quotations from the participants. In this way, it is made clear to the reader that the themes under discussion have emerged from the participants’ interviews and not the mind of the researcher. The work of Latif and others 12 gives an example of how qualitative research findings might be presented.

Planning and Writing the Report

As has been suggested above, if researchers code and theme their material appropriately, they will naturally find the headings for sections of their report. Qualitative researchers tend to report “findings” rather than “results”, as the latter term typically implies that the data have come from a quantitative source. The final presentation of the research will usually be in the form of a report or a paper and so should follow accepted academic guidelines. In particular, the article should begin with an introduction, including a literature review and rationale for the research. There should be a section on the chosen methodology and a brief discussion about why qualitative methodology was most appropriate for the study question and why one particular methodology (e.g., interpretative phenomenological analysis rather than grounded theory) was selected to guide the research. The method itself should then be described, including ethics approval, choice of participants, mode of recruitment, and method of data collection (e.g., semistructured interviews or focus groups), followed by the research findings, which will be the main body of the report or paper. The findings should be written as if a story is being told; as such, it is not necessary to have a lengthy discussion section at the end. This is because much of the discussion will take place around the participants’ quotes, such that all that is needed to close the report or paper is a summary, limitations of the research, and the implications that the research has for practice. As stated earlier, it is not the intention of qualitative research to allow the findings to be generalized, and therefore this is not, in itself, a limitation.

Planning out the way that findings are to be presented is helpful. It is useful to insert the headings of the sections (the themes) and then make a note of the codes that exemplify the thoughts and feelings of your participants. It is generally advisable to put in the quotations that you want to use for each theme, using each quotation only once. After all this is done, the telling of the story can begin as you give your voice to the experiences of the participants, writing around their quotations. Do not be afraid to draw assumptions from the participants’ narratives, as this is necessary to give an in-depth account of the phenomena in question. Discuss these assumptions, drawing on your participants’ words to support you as you move from one code to another and from one theme to the next. Finally, as appropriate, it is possible to include examples from literature or policy documents that add support for your findings. As an exercise, you may wish to code and theme the sample excerpt in Appendix 1 and tell the participant’s story in your own way. Further reading about “doing” qualitative research can be found at the end of this paper.

CONCLUSIONS

Qualitative research can help researchers to access the thoughts and feelings of research participants, which can enable development of an understanding of the meaning that people ascribe to their experiences. It can be used in pharmacy practice research to explore how patients feel about their health and their treatment. Qualitative research has been used by pharmacists to explore a variety of questions and problems (see the “Further Reading” section for examples). An understanding of these issues can help pharmacists and other health care professionals to tailor health care to match the individual needs of patients and to develop a concordant relationship. Doing qualitative research is not easy and may require a complete rethink of how research is conducted, particularly for researchers who are more familiar with quantitative approaches. There are many ways of conducting qualitative research, and this paper has covered some of the practical issues regarding data collection, analysis, and management. Further reading around the subject will be essential to truly understand this method of accessing peoples’ thoughts and feelings to enable researchers to tell participants’ stories.

Appendix 1. Excerpt from a sample transcript

The participant (age late 50s) had suffered from a chronic mental health illness for 30 years. The participant had become a “revolving door patient,” someone who is frequently in and out of hospital. As the participant talked about past experiences, the researcher asked:

What was treatment like 30 years ago?
Umm—well it was pretty much they could do what they wanted with you because I was put into the er, the er kind of system er, I was just on
endless section threes.
Really…
But what I didn’t realize until later was that if you haven’t actually posed a threat to someone or yourself they can’t really do that but I didn’t know
that. So wh-when I first went into hospital they put me on the forensic ward ’cause they said, “We don’t think you’ll stay here we think you’ll just
run-run away.” So they put me then onto the acute admissions ward and – er – I can remember one of the first things I recall when I got onto that
ward was sitting down with a er a Dr XXX. He had a book this thick [gestures] and on each page it was like three questions and he went through
all these questions and I answered all these questions. So we’re there for I don’t maybe two hours doing all that and he asked me he said “well
when did somebody tell you then that you have schizophrenia” I said “well nobody’s told me that” so he seemed very surprised but nobody had
actually [pause] whe-when I first went up there under police escort erm the senior kind of consultants people I’d been to where I was staying and
ermm so er [pause] I . . . the, I can remember the very first night that I was there and given this injection in this muscle here [gestures] and just
having dreadful side effects the next day I woke up [pause]
. . . and I suffered that akathesia I swear to you, every minute of every day for about 20 years.
Oh how awful.
And that side of it just makes life impossible so the care on the wards [pause] umm I don’t know it’s kind of, it’s kind of hard to put into words
[pause]. Because I’m not saying they were sort of like not friendly or interested but then nobody ever seemed to want to talk about your life [pause]
nobody asked me any questions about my life. The only questions that came into was they asked me if I’d be a volunteer for these student exams
and things and I said “yeah” so all the questions were like “oh what jobs have you done,” er about your relationships and things and er but
nobody actually sat down and had a talk and showed some interest in you as a person you were just there basically [pause] um labelled and you
know there was there was [pause] but umm [pause] yeah . . .

This article is the 10th in the CJHP Research Primer Series, an initiative of the CJHP Editorial Board and the CSHP Research Committee. The planned 2-year series is intended to appeal to relatively inexperienced researchers, with the goal of building research capacity among practising pharmacists. The articles, presenting simple but rigorous guidance to encourage and support novice researchers, are being solicited from authors with appropriate expertise.

Previous articles in this series:

Bond CM. The research jigsaw: how to get started. Can J Hosp Pharm . 2014;67(1):28–30.

Tully MP. Research: articulating questions, generating hypotheses, and choosing study designs. Can J Hosp Pharm . 2014;67(1):31–4.

Loewen P. Ethical issues in pharmacy practice research: an introductory guide. Can J Hosp Pharm. 2014;67(2):133–7.

Tsuyuki RT. Designing pharmacy practice research trials. Can J Hosp Pharm . 2014;67(3):226–9.

Bresee LC. An introduction to developing surveys for pharmacy practice research. Can J Hosp Pharm . 2014;67(4):286–91.

Gamble JM. An introduction to the fundamentals of cohort and case–control studies. Can J Hosp Pharm . 2014;67(5):366–72.

Austin Z, Sutton J. Qualitative research: getting started. C an J Hosp Pharm . 2014;67(6):436–40.

Houle S. An introduction to the fundamentals of randomized controlled trials in pharmacy research. Can J Hosp Pharm . 2014; 68(1):28–32.

Charrois TL. Systematic reviews: What do you need to know to get started? Can J Hosp Pharm . 2014;68(2):144–8.

Competing interests: None declared.

Qualitative Research in General

Breakwell GM, Hammond S, Fife-Schaw C. Research methods in psychology. Thousand Oaks (CA): Sage Publications; 1995. [ Google Scholar ]
Given LM. 100 questions (and answers) about qualitative research. Thousand Oaks (CA): Sage Publications; 2015. [ Google Scholar ]
Miles B, Huberman AM. Qualitative data analysis. Thousand Oaks (CA): Sage Publications; 2009. [ Google Scholar ]
Patton M. Qualitative research and evaluation methods. Thousand Oaks (CA): Sage Publications; 2002. [ Google Scholar ]
Willig C. Introducing qualitative research in psychology. Buckingham (UK): Open University Press; 2001. [ Google Scholar ]

Group Dynamics in Focus Groups

Farnsworth J, Boon B. Analysing group dynamics within the focus group. Qual Res. 2010; 10 (5):605–24. [ Google Scholar ]

Social Constructivism

Social constructivism. Berkeley (CA): University of California, Berkeley, Berkeley Graduate Division, Graduate Student Instruction Teaching & Resource Center; [cited 2015 June 4]. Available from: http://gsi.berkeley.edu/gsi-guide-contents/learning-theory-research/social-constructivism/ [ Google Scholar ]

Mixed Methods

Creswell J. Research design: qualitative, quantitative, and mixed methods approaches. Thousand Oaks (CA): Sage Publications; 2009. [ Google Scholar ]

Collecting Qualitative Data

Arksey H, Knight P. Interviewing for social scientists: an introductory resource with examples. Thousand Oaks (CA): Sage Publications; 1999. [ Google Scholar ]
Guest G, Namey EE, Mitchel ML. Collecting qualitative data: a field manual for applied research. Thousand Oaks (CA): Sage Publications; 2013. [ Google Scholar ]

Constructivist Grounded Theory

Charmaz K. Grounded theory: objectivist and constructivist methods. In: Denzin N, Lincoln Y, editors. Handbook of qualitative research. 2nd ed. Thousand Oaks (CA): Sage Publications; 2000. pp. 509–35. [ Google Scholar ]

Case Study Research Method in Psychology

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Case studies are in-depth investigations of a person, group, event, or community. Typically, data is gathered from various sources using several methods (e.g., observations & interviews).

The case study research method originated in clinical medicine (the case history, i.e., the patient’s personal history). In psychology, case studies are often confined to the study of a particular individual.

The information is mainly biographical and relates to events in the individual’s past (i.e., retrospective), as well as to significant events that are currently occurring in his or her everyday life.

The case study is not a research method, but researchers select methods of data collection and analysis that will generate material suitable for case studies.

Freud (1909a, 1909b) conducted very detailed investigations into the private lives of his patients in an attempt to both understand and help them overcome their illnesses.

This makes it clear that the case study is a method that should only be used by a psychologist, therapist, or psychiatrist, i.e., someone with a professional qualification.

There is an ethical issue of competence. Only someone qualified to diagnose and treat a person can conduct a formal case study relating to atypical (i.e., abnormal) behavior or atypical development.

Famous Case Studies

Anna O – One of the most famous case studies, documenting psychoanalyst Josef Breuer’s treatment of “Anna O” (real name Bertha Pappenheim) for hysteria in the late 1800s using early psychoanalytic theory.
Little Hans – A child psychoanalysis case study published by Sigmund Freud in 1909 analyzing his five-year-old patient Herbert Graf’s house phobia as related to the Oedipus complex.
Bruce/Brenda – Gender identity case of the boy (Bruce) whose botched circumcision led psychologist John Money to advise gender reassignment and raise him as a girl (Brenda) in the 1960s.
Genie Wiley – Linguistics/psychological development case of the victim of extreme isolation abuse who was studied in 1970s California for effects of early language deprivation on acquiring speech later in life.
Phineas Gage – One of the most famous neuropsychology case studies analyzes personality changes in railroad worker Phineas Gage after an 1848 brain injury involving a tamping iron piercing his skull.

Clinical Case Studies

Studying the effectiveness of psychotherapy approaches with an individual patient
Assessing and treating mental illnesses like depression, anxiety disorders, PTSD
Neuropsychological cases investigating brain injuries or disorders

Child Psychology Case Studies

Studying psychological development from birth through adolescence
Cases of learning disabilities, autism spectrum disorders, ADHD
Effects of trauma, abuse, deprivation on development

Types of Case Studies

Explanatory case studies : Used to explore causation in order to find underlying principles. Helpful for doing qualitative analysis to explain presumed causal links.
Exploratory case studies : Used to explore situations where an intervention being evaluated has no clear set of outcomes. It helps define questions and hypotheses for future research.
Descriptive case studies : Describe an intervention or phenomenon and the real-life context in which it occurred. It is helpful for illustrating certain topics within an evaluation.
Multiple-case studies : Used to explore differences between cases and replicate findings across cases. Helpful for comparing and contrasting specific cases.
Intrinsic : Used to gain a better understanding of a particular case. Helpful for capturing the complexity of a single case.
Collective : Used to explore a general phenomenon using multiple case studies. Helpful for jointly studying a group of cases in order to inquire into the phenomenon.

Where Do You Find Data for a Case Study?

There are several places to find data for a case study. The key is to gather data from multiple sources to get a complete picture of the case and corroborate facts or findings through triangulation of evidence. Most of this information is likely qualitative (i.e., verbal description rather than measurement), but the psychologist might also collect numerical data.

1. Primary sources

Interviews – Interviewing key people related to the case to get their perspectives and insights. The interview is an extremely effective procedure for obtaining information about an individual, and it may be used to collect comments from the person’s friends, parents, employer, workmates, and others who have a good knowledge of the person, as well as to obtain facts from the person him or herself.
Observations – Observing behaviors, interactions, processes, etc., related to the case as they unfold in real-time.
Documents & Records – Reviewing private documents, diaries, public records, correspondence, meeting minutes, etc., relevant to the case.

2. Secondary sources

News/Media – News coverage of events related to the case study.
Academic articles – Journal articles, dissertations etc. that discuss the case.
Government reports – Official data and records related to the case context.
Books/films – Books, documentaries or films discussing the case.

3. Archival records

Searching historical archives, museum collections and databases to find relevant documents, visual/audio records related to the case history and context.

Public archives like newspapers, organizational records, photographic collections could all include potentially relevant pieces of information to shed light on attitudes, cultural perspectives, common practices and historical contexts related to psychology.

4. Organizational records

Organizational records offer the advantage of often having large datasets collected over time that can reveal or confirm psychological insights.

Of course, privacy and ethical concerns regarding confidential data must be navigated carefully.

However, with proper protocols, organizational records can provide invaluable context and empirical depth to qualitative case studies exploring the intersection of psychology and organizations.

Organizational/industrial psychology research : Organizational records like employee surveys, turnover/retention data, policies, incident reports etc. may provide insight into topics like job satisfaction, workplace culture and dynamics, leadership issues, employee behaviors etc.
Clinical psychology : Therapists/hospitals may grant access to anonymized medical records to study aspects like assessments, diagnoses, treatment plans etc. This could shed light on clinical practices.
School psychology : Studies could utilize anonymized student records like test scores, grades, disciplinary issues, and counseling referrals to study child development, learning barriers, effectiveness of support programs, and more.

How do I Write a Case Study in Psychology?

Follow specified case study guidelines provided by a journal or your psychology tutor. General components of clinical case studies include: background, symptoms, assessments, diagnosis, treatment, and outcomes. Interpreting the information means the researcher decides what to include or leave out. A good case study should always clarify which information is the factual description and which is an inference or the researcher’s opinion.

1. Introduction

Provide background on the case context and why it is of interest, presenting background information like demographics, relevant history, and presenting problem.
Compare briefly to similar published cases if applicable. Clearly state the focus/importance of the case.

2. Case Presentation

Describe the presenting problem in detail, including symptoms, duration,and impact on daily life.
Include client demographics like age and gender, information about social relationships, and mental health history.
Describe all physical, emotional, and/or sensory symptoms reported by the client.
Use patient quotes to describe the initial complaint verbatim. Follow with full-sentence summaries of relevant history details gathered, including key components that led to a working diagnosis.
Summarize clinical exam results, namely orthopedic/neurological tests, imaging, lab tests, etc. Note actual results rather than subjective conclusions. Provide images if clearly reproducible/anonymized.
Clearly state the working diagnosis or clinical impression before transitioning to management.

3. Management and Outcome

Indicate the total duration of care and number of treatments given over what timeframe. Use specific names/descriptions for any therapies/interventions applied.
Present the results of the intervention,including any quantitative or qualitative data collected.
For outcomes, utilize visual analog scales for pain, medication usage logs, etc., if possible. Include patient self-reports of improvement/worsening of symptoms. Note the reason for discharge/end of care.

4. Discussion

Analyze the case, exploring contributing factors, limitations of the study, and connections to existing research.
Analyze the effectiveness of the intervention,considering factors like participant adherence, limitations of the study, and potential alternative explanations for the results.
Identify any questions raised in the case analysis and relate insights to established theories and current research if applicable. Avoid definitive claims about physiological explanations.
Offer clinical implications, and suggest future research directions.

5. Additional Items

Thank specific assistants for writing support only. No patient acknowledgments.
References should directly support any key claims or quotes included.
Use tables/figures/images only if substantially informative. Include permissions and legends/explanatory notes.
Provides detailed (rich qualitative) information.
Provides insight for further research.
Permitting investigation of otherwise impractical (or unethical) situations.

Case studies allow a researcher to investigate a topic in far more detail than might be possible if they were trying to deal with a large number of research participants (nomothetic approach) with the aim of ‘averaging’.

Because of their in-depth, multi-sided approach, case studies often shed light on aspects of human thinking and behavior that would be unethical or impractical to study in other ways.

Research that only looks into the measurable aspects of human behavior is not likely to give us insights into the subjective dimension of experience, which is important to psychoanalytic and humanistic psychologists.

Case studies are often used in exploratory research. They can help us generate new ideas (that might be tested by other methods). They are an important way of illustrating theories and can help show how different aspects of a person’s life are related to each other.

The method is, therefore, important for psychologists who adopt a holistic point of view (i.e., humanistic psychologists ).

Limitations

Lacking scientific rigor and providing little basis for generalization of results to the wider population.
Researchers’ own subjective feelings may influence the case study (researcher bias).
Difficult to replicate.
Time-consuming and expensive.
The volume of data, together with the time restrictions in place, impacted the depth of analysis that was possible within the available resources.

Because a case study deals with only one person/event/group, we can never be sure if the case study investigated is representative of the wider body of “similar” instances. This means the conclusions drawn from a particular case may not be transferable to other settings.

Because case studies are based on the analysis of qualitative (i.e., descriptive) data , a lot depends on the psychologist’s interpretation of the information she has acquired.

This means that there is a lot of scope for Anna O , and it could be that the subjective opinions of the psychologist intrude in the assessment of what the data means.

For example, Freud has been criticized for producing case studies in which the information was sometimes distorted to fit particular behavioral theories (e.g., Little Hans ).

This is also true of Money’s interpretation of the Bruce/Brenda case study (Diamond, 1997) when he ignored evidence that went against his theory.

Breuer, J., & Freud, S. (1895). Studies on hysteria . Standard Edition 2: London.

Curtiss, S. (1981). Genie: The case of a modern wild child .

Diamond, M., & Sigmundson, K. (1997). Sex Reassignment at Birth: Long-term Review and Clinical Implications. Archives of Pediatrics & Adolescent Medicine , 151(3), 298-304

Freud, S. (1909a). Analysis of a phobia of a five year old boy. In The Pelican Freud Library (1977), Vol 8, Case Histories 1, pages 169-306

Freud, S. (1909b). Bemerkungen über einen Fall von Zwangsneurose (Der “Rattenmann”). Jb. psychoanal. psychopathol. Forsch ., I, p. 357-421; GW, VII, p. 379-463; Notes upon a case of obsessional neurosis, SE , 10: 151-318.

Harlow J. M. (1848). Passage of an iron rod through the head. Boston Medical and Surgical Journal, 39 , 389–393.

Harlow, J. M. (1868). Recovery from the Passage of an Iron Bar through the Head . Publications of the Massachusetts Medical Society. 2 (3), 327-347.

Money, J., & Ehrhardt, A. A. (1972). Man & Woman, Boy & Girl : The Differentiation and Dimorphism of Gender Identity from Conception to Maturity. Baltimore, Maryland: Johns Hopkins University Press.

Money, J., & Tucker, P. (1975). Sexual signatures: On being a man or a woman.

Further Information

Case Study Approach
Case Study Method
Enhancing the Quality of Case Studies in Health Services Research
“We do things together” A case study of “couplehood” in dementia
Using mixed methods for evaluating an integrative approach to cancer care: a case study

Research Methodology

Qualitative Data Coding

What Is a Focus Group?

Cross-Cultural Research Methodology In Psychology

What Is Internal Validity In Research?

Research Methodology , Statistics

What Is Face Validity In Research? Importance & How To Measure

Criterion Validity: Definition & Examples

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

Knowledge Base
Methodology
Data Collection Methods | Step-by-Step Guide & Examples

Data Collection Methods | Step-by-Step Guide & Examples

Published on 4 May 2022 by Pritha Bhandari .

Data collection is a systematic process of gathering observations or measurements. Whether you are performing research for business, governmental, or academic purposes, data collection allows you to gain first-hand knowledge and original insights into your research problem .

While methods and aims may differ between fields, the overall process of data collection remains largely the same. Before you begin collecting data, you need to consider:

The aim of the research
The type of data that you will collect
The methods and procedures you will use to collect, store, and process the data

To collect high-quality data that is relevant to your purposes, follow these four steps.

Step 1: define the aim of your research, step 2: choose your data collection method, step 3: plan your data collection procedures, step 4: collect the data, frequently asked questions about data collection.

Before you start the process of data collection, you need to identify exactly what you want to achieve. You can start by writing a problem statement : what is the practical or scientific issue that you want to address, and why does it matter?

Next, formulate one or more research questions that precisely define what you want to find out. Depending on your research questions, you might need to collect quantitative or qualitative data :

Quantitative data is expressed in numbers and graphs and is analysed through statistical methods .
Qualitative data is expressed in words and analysed through interpretations and categorisations.

If your aim is to test a hypothesis , measure something precisely, or gain large-scale statistical insights, collect quantitative data. If your aim is to explore ideas, understand experiences, or gain detailed insights into a specific context, collect qualitative data.

If you have several aims, you can use a mixed methods approach that collects both types of data.

Your first aim is to assess whether there are significant differences in perceptions of managers across different departments and office locations.
Your second aim is to gather meaningful feedback from employees to explore new ideas for how managers can improve.

Prevent plagiarism, run a free check.

Based on the data you want to collect, decide which method is best suited for your research.

Experimental research is primarily a quantitative method.
Interviews , focus groups , and ethnographies are qualitative methods.
Surveys , observations, archival research, and secondary data collection can be quantitative or qualitative methods.

Carefully consider what method you will use to gather data that helps you directly answer your research questions.

When you know which method(s) you are using, you need to plan exactly how you will implement them. What procedures will you follow to make accurate observations or measurements of the variables you are interested in?

For instance, if you’re conducting surveys or interviews, decide what form the questions will take; if you’re conducting an experiment, make decisions about your experimental design .

Operationalisation

Sometimes your variables can be measured directly: for example, you can collect data on the average age of employees simply by asking for dates of birth. However, often you’ll be interested in collecting data on more abstract concepts or variables that can’t be directly observed.

Operationalisation means turning abstract conceptual ideas into measurable observations. When planning how you will collect data, you need to translate the conceptual definition of what you want to study into the operational definition of what you will actually measure.

You ask managers to rate their own leadership skills on 5-point scales assessing the ability to delegate, decisiveness, and dependability.
You ask their direct employees to provide anonymous feedback on the managers regarding the same topics.

You may need to develop a sampling plan to obtain data systematically. This involves defining a population , the group you want to draw conclusions about, and a sample, the group you will actually collect data from.

Your sampling method will determine how you recruit participants or obtain measurements for your study. To decide on a sampling method you will need to consider factors like the required sample size, accessibility of the sample, and time frame of the data collection.

Standardising procedures

If multiple researchers are involved, write a detailed manual to standardise data collection procedures in your study.

This means laying out specific step-by-step instructions so that everyone in your research team collects data in a consistent way – for example, by conducting experiments under the same conditions and using objective criteria to record and categorise observations.

This helps ensure the reliability of your data, and you can also use it to replicate the study in the future.

Creating a data management plan

Before beginning data collection, you should also decide how you will organise and store your data.

If you are collecting data from people, you will likely need to anonymise and safeguard the data to prevent leaks of sensitive information (e.g. names or identity numbers).
If you are collecting data via interviews or pencil-and-paper formats, you will need to perform transcriptions or data entry in systematic ways to minimise distortion.
You can prevent loss of data by having an organisation system that is routinely backed up.

Finally, you can implement your chosen methods to measure or observe the variables you are interested in.

The closed-ended questions ask participants to rate their manager’s leadership skills on scales from 1 to 5. The data produced is numerical and can be statistically analysed for averages and patterns.

To ensure that high-quality data is recorded in a systematic way, here are some best practices:

Record all relevant information as and when you obtain data. For example, note down whether or how lab equipment is recalibrated during an experimental study.
Double-check manual data entry for errors.
If you collect quantitative data, you can assess the reliability and validity to get an indication of your data quality.

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organisations.

When conducting research, collecting original data has significant advantages:

You can tailor data collection to your specific research aims (e.g., understanding the needs of your consumers or user testing your website).
You can control and standardise the process for high reliability and validity (e.g., choosing appropriate measurements and sampling methods ).

However, there are also some drawbacks: data collection can be time-consuming, labour-intensive, and expensive. In some cases, it’s more efficient to use secondary data that has already been collected by someone else, but the data might be less reliable.

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to test a hypothesis by systematically collecting and analysing data, while qualitative methods allow you to explore ideas and experiences in depth.

Reliability and validity are both about how well a method measures something:

Reliability refers to the consistency of a measure (whether the results can be reproduced under the same conditions).
Validity refers to the accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research , you also have to consider the internal and external validity of your experiment.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

Operationalisation means turning abstract conceptual ideas into measurable observations.

For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioural avoidance of crowded places, or physical anxiety symptoms in social situations.

Before collecting data , it’s important to consider how you will operationalise the variables that you want to measure.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Bhandari, P. (2022, May 04). Data Collection Methods | Step-by-Step Guide & Examples. Scribbr. Retrieved 14 May 2024, from https://www.scribbr.co.uk/research-methods/data-collection-guide/

Is this article helpful?

Pritha Bhandari

Other students also liked, qualitative vs quantitative research | examples & methods, triangulation in research | guide, types, examples, what is a conceptual framework | tips & examples.

To read this content please select one of the options below:

Please note you do not have access to teaching notes, a systematic qualitative case study: questions, data collection, nvivo analysis and saturation.

Qualitative Research in Organizations and Management

ISSN : 1746-5648

Article publication date: 20 August 2020

Issue publication date: 23 February 2021

The study aims to explore the case study method with the formation of questions, data collection procedures and analysis, followed by how and on which position the saturation is achieved in developing a centralized Shariah governance framework for Islamic banks in Bangladesh.

Design/methodology/approach

Using purposive and snowball sampling procedures, data have been collected from 17 respondents who are working in the central bank and Islamic banks of Bangladesh through face-to-face and semi-structured interviews.

The study claims that researchers can form the research questions by using “what” question mark in qualitative research. Besides, the qualitative research and case study could explore the answers of “what” questions along with the “why” and “how” more broadly, descriptively and extensively about a phenomenon. Similarly, saturation can be considered attaining the ultimate point of data collection by the researchers without adding anything in the databank. Overall, this study proposes three stages of saturation: First, information redundancy. Second, referring the respondents (already considered in the study) without knowing anything about the data collection and their responses. Third, through the NVivo open coding process due to the decrease of reference or quotes in a certain position or in the saturation position as a result of fewer outcomes or insufficient information. The saturation is thus achieved in the diversified positions, i.e. three respondents for regulatory, nine for Shariah scholars and officers and five for the experts concerning the responses and respondents.

Research limitations/implications

The study has potential implications on the qualitative research method, including the case study, saturation process and points, NVivo analysis and qualitative questions formation.

Originality/value

This research defines a case study with the inclusion of “what” and illustrates the saturation process in diverse positions. The qualitative research questions can also be formed with “what” in addition “why” and “how”.

Qualitative research

Acknowledgements

The author would like to thank both Editors, Associate Editor Dr. Amon Barros and anonymous reviewers for their valuable time, constructive comments and suggestions for the improvement of the manuscript. A special thanks to Dr. Mohd Mursyid Arshad, Senior Lecturer at Faculty of Educational Studies, Universiti Putra Malaysia, Malaysia for his suggestions and unconditional help in the NVivo data analysis process of this research project. The study does not receive any specific fund or research grant.

Alam, M.K. (2021), "A systematic qualitative case study: questions, data collection, NVivo analysis and saturation", Qualitative Research in Organizations and Management , Vol. 16 No. 1, pp. 1-31. https://doi.org/10.1108/QROM-09-2019-1825

Emerald Publishing Limited

We’re listening — tell us what you think, something didn’t work….

Report bugs here

All feedback is valuable

Please share your general feedback

Join us on our journey

Platform update page.

Visit emeraldpublishing.com/platformupdate to discover the latest news and updates

Questions & More Information

Answers to the most commonly asked questions here

Home » Data Collection – Methods Types and Examples

Data Collection – Methods Types and Examples

Table of Contents

Data Collection

Definition:

Data collection is the process of gathering and collecting information from various sources to analyze and make informed decisions based on the data collected. This can involve various methods, such as surveys, interviews, experiments, and observation.

In order for data collection to be effective, it is important to have a clear understanding of what data is needed and what the purpose of the data collection is. This can involve identifying the population or sample being studied, determining the variables to be measured, and selecting appropriate methods for collecting and recording data.

Types of Data Collection

Types of Data Collection are as follows:

Primary Data Collection

Primary data collection is the process of gathering original and firsthand information directly from the source or target population. This type of data collection involves collecting data that has not been previously gathered, recorded, or published. Primary data can be collected through various methods such as surveys, interviews, observations, experiments, and focus groups. The data collected is usually specific to the research question or objective and can provide valuable insights that cannot be obtained from secondary data sources. Primary data collection is often used in market research, social research, and scientific research.

Secondary Data Collection

Secondary data collection is the process of gathering information from existing sources that have already been collected and analyzed by someone else, rather than conducting new research to collect primary data. Secondary data can be collected from various sources, such as published reports, books, journals, newspapers, websites, government publications, and other documents.

Qualitative Data Collection

Qualitative data collection is used to gather non-numerical data such as opinions, experiences, perceptions, and feelings, through techniques such as interviews, focus groups, observations, and document analysis. It seeks to understand the deeper meaning and context of a phenomenon or situation and is often used in social sciences, psychology, and humanities. Qualitative data collection methods allow for a more in-depth and holistic exploration of research questions and can provide rich and nuanced insights into human behavior and experiences.

Quantitative Data Collection

Quantitative data collection is a used to gather numerical data that can be analyzed using statistical methods. This data is typically collected through surveys, experiments, and other structured data collection methods. Quantitative data collection seeks to quantify and measure variables, such as behaviors, attitudes, and opinions, in a systematic and objective way. This data is often used to test hypotheses, identify patterns, and establish correlations between variables. Quantitative data collection methods allow for precise measurement and generalization of findings to a larger population. It is commonly used in fields such as economics, psychology, and natural sciences.

Data Collection Methods

Data Collection Methods are as follows:

Surveys involve asking questions to a sample of individuals or organizations to collect data. Surveys can be conducted in person, over the phone, or online.

Interviews involve a one-on-one conversation between the interviewer and the respondent. Interviews can be structured or unstructured and can be conducted in person or over the phone.

Focus Groups

Focus groups are group discussions that are moderated by a facilitator. Focus groups are used to collect qualitative data on a specific topic.

Observation

Observation involves watching and recording the behavior of people, objects, or events in their natural setting. Observation can be done overtly or covertly, depending on the research question.

Experiments

Experiments involve manipulating one or more variables and observing the effect on another variable. Experiments are commonly used in scientific research.

Case Studies

Case studies involve in-depth analysis of a single individual, organization, or event. Case studies are used to gain detailed information about a specific phenomenon.

Secondary Data Analysis

Secondary data analysis involves using existing data that was collected for another purpose. Secondary data can come from various sources, such as government agencies, academic institutions, or private companies.

How to Collect Data

The following are some steps to consider when collecting data:

Define the objective : Before you start collecting data, you need to define the objective of the study. This will help you determine what data you need to collect and how to collect it.
Identify the data sources : Identify the sources of data that will help you achieve your objective. These sources can be primary sources, such as surveys, interviews, and observations, or secondary sources, such as books, articles, and databases.
Determine the data collection method : Once you have identified the data sources, you need to determine the data collection method. This could be through online surveys, phone interviews, or face-to-face meetings.
Develop a data collection plan : Develop a plan that outlines the steps you will take to collect the data. This plan should include the timeline, the tools and equipment needed, and the personnel involved.
Test the data collection process: Before you start collecting data, test the data collection process to ensure that it is effective and efficient.
Collect the data: Collect the data according to the plan you developed in step 4. Make sure you record the data accurately and consistently.
Analyze the data: Once you have collected the data, analyze it to draw conclusions and make recommendations.
Report the findings: Report the findings of your data analysis to the relevant stakeholders. This could be in the form of a report, a presentation, or a publication.
Monitor and evaluate the data collection process: After the data collection process is complete, monitor and evaluate the process to identify areas for improvement in future data collection efforts.
Ensure data quality: Ensure that the collected data is of high quality and free from errors. This can be achieved by validating the data for accuracy, completeness, and consistency.
Maintain data security: Ensure that the collected data is secure and protected from unauthorized access or disclosure. This can be achieved by implementing data security protocols and using secure storage and transmission methods.
Follow ethical considerations: Follow ethical considerations when collecting data, such as obtaining informed consent from participants, protecting their privacy and confidentiality, and ensuring that the research does not cause harm to participants.
Use appropriate data analysis methods : Use appropriate data analysis methods based on the type of data collected and the research objectives. This could include statistical analysis, qualitative analysis, or a combination of both.
Record and store data properly: Record and store the collected data properly, in a structured and organized format. This will make it easier to retrieve and use the data in future research or analysis.
Collaborate with other stakeholders : Collaborate with other stakeholders, such as colleagues, experts, or community members, to ensure that the data collected is relevant and useful for the intended purpose.

Applications of Data Collection

Data collection methods are widely used in different fields, including social sciences, healthcare, business, education, and more. Here are some examples of how data collection methods are used in different fields:

Social sciences : Social scientists often use surveys, questionnaires, and interviews to collect data from individuals or groups. They may also use observation to collect data on social behaviors and interactions. This data is often used to study topics such as human behavior, attitudes, and beliefs.
Healthcare : Data collection methods are used in healthcare to monitor patient health and track treatment outcomes. Electronic health records and medical charts are commonly used to collect data on patients’ medical history, diagnoses, and treatments. Researchers may also use clinical trials and surveys to collect data on the effectiveness of different treatments.
Business : Businesses use data collection methods to gather information on consumer behavior, market trends, and competitor activity. They may collect data through customer surveys, sales reports, and market research studies. This data is used to inform business decisions, develop marketing strategies, and improve products and services.
Education : In education, data collection methods are used to assess student performance and measure the effectiveness of teaching methods. Standardized tests, quizzes, and exams are commonly used to collect data on student learning outcomes. Teachers may also use classroom observation and student feedback to gather data on teaching effectiveness.
Agriculture : Farmers use data collection methods to monitor crop growth and health. Sensors and remote sensing technology can be used to collect data on soil moisture, temperature, and nutrient levels. This data is used to optimize crop yields and minimize waste.
Environmental sciences : Environmental scientists use data collection methods to monitor air and water quality, track climate patterns, and measure the impact of human activity on the environment. They may use sensors, satellite imagery, and laboratory analysis to collect data on environmental factors.
Transportation : Transportation companies use data collection methods to track vehicle performance, optimize routes, and improve safety. GPS systems, on-board sensors, and other tracking technologies are used to collect data on vehicle speed, fuel consumption, and driver behavior.

Examples of Data Collection

Examples of Data Collection are as follows:

Traffic Monitoring: Cities collect real-time data on traffic patterns and congestion through sensors on roads and cameras at intersections. This information can be used to optimize traffic flow and improve safety.
Social Media Monitoring : Companies can collect real-time data on social media platforms such as Twitter and Facebook to monitor their brand reputation, track customer sentiment, and respond to customer inquiries and complaints in real-time.
Weather Monitoring: Weather agencies collect real-time data on temperature, humidity, air pressure, and precipitation through weather stations and satellites. This information is used to provide accurate weather forecasts and warnings.
Stock Market Monitoring : Financial institutions collect real-time data on stock prices, trading volumes, and other market indicators to make informed investment decisions and respond to market fluctuations in real-time.
Health Monitoring : Medical devices such as wearable fitness trackers and smartwatches can collect real-time data on a person’s heart rate, blood pressure, and other vital signs. This information can be used to monitor health conditions and detect early warning signs of health issues.

Purpose of Data Collection

The purpose of data collection can vary depending on the context and goals of the study, but generally, it serves to:

Provide information: Data collection provides information about a particular phenomenon or behavior that can be used to better understand it.
Measure progress : Data collection can be used to measure the effectiveness of interventions or programs designed to address a particular issue or problem.
Support decision-making : Data collection provides decision-makers with evidence-based information that can be used to inform policies, strategies, and actions.
Identify trends : Data collection can help identify trends and patterns over time that may indicate changes in behaviors or outcomes.
Monitor and evaluate : Data collection can be used to monitor and evaluate the implementation and impact of policies, programs, and initiatives.

When to use Data Collection

Data collection is used when there is a need to gather information or data on a specific topic or phenomenon. It is typically used in research, evaluation, and monitoring and is important for making informed decisions and improving outcomes.

Data collection is particularly useful in the following scenarios:

Research : When conducting research, data collection is used to gather information on variables of interest to answer research questions and test hypotheses.
Evaluation : Data collection is used in program evaluation to assess the effectiveness of programs or interventions, and to identify areas for improvement.
Monitoring : Data collection is used in monitoring to track progress towards achieving goals or targets, and to identify any areas that require attention.
Decision-making: Data collection is used to provide decision-makers with information that can be used to inform policies, strategies, and actions.
Quality improvement : Data collection is used in quality improvement efforts to identify areas where improvements can be made and to measure progress towards achieving goals.

Characteristics of Data Collection

Data collection can be characterized by several important characteristics that help to ensure the quality and accuracy of the data gathered. These characteristics include:

Validity : Validity refers to the accuracy and relevance of the data collected in relation to the research question or objective.
Reliability : Reliability refers to the consistency and stability of the data collection process, ensuring that the results obtained are consistent over time and across different contexts.
Objectivity : Objectivity refers to the impartiality of the data collection process, ensuring that the data collected is not influenced by the biases or personal opinions of the data collector.
Precision : Precision refers to the degree of accuracy and detail in the data collected, ensuring that the data is specific and accurate enough to answer the research question or objective.
Timeliness : Timeliness refers to the efficiency and speed with which the data is collected, ensuring that the data is collected in a timely manner to meet the needs of the research or evaluation.
Ethical considerations : Ethical considerations refer to the ethical principles that must be followed when collecting data, such as ensuring confidentiality and obtaining informed consent from participants.

Advantages of Data Collection

There are several advantages of data collection that make it an important process in research, evaluation, and monitoring. These advantages include:

Better decision-making : Data collection provides decision-makers with evidence-based information that can be used to inform policies, strategies, and actions, leading to better decision-making.
Improved understanding: Data collection helps to improve our understanding of a particular phenomenon or behavior by providing empirical evidence that can be analyzed and interpreted.
Evaluation of interventions: Data collection is essential in evaluating the effectiveness of interventions or programs designed to address a particular issue or problem.
Identifying trends and patterns: Data collection can help identify trends and patterns over time that may indicate changes in behaviors or outcomes.
Increased accountability: Data collection increases accountability by providing evidence that can be used to monitor and evaluate the implementation and impact of policies, programs, and initiatives.
Validation of theories: Data collection can be used to test hypotheses and validate theories, leading to a better understanding of the phenomenon being studied.
Improved quality: Data collection is used in quality improvement efforts to identify areas where improvements can be made and to measure progress towards achieving goals.

Limitations of Data Collection

While data collection has several advantages, it also has some limitations that must be considered. These limitations include:

Bias : Data collection can be influenced by the biases and personal opinions of the data collector, which can lead to inaccurate or misleading results.
Sampling bias : Data collection may not be representative of the entire population, resulting in sampling bias and inaccurate results.
Cost : Data collection can be expensive and time-consuming, particularly for large-scale studies.
Limited scope: Data collection is limited to the variables being measured, which may not capture the entire picture or context of the phenomenon being studied.
Ethical considerations : Data collection must follow ethical principles to protect the rights and confidentiality of the participants, which can limit the type of data that can be collected.
Data quality issues: Data collection may result in data quality issues such as missing or incomplete data, measurement errors, and inconsistencies.
Limited generalizability : Data collection may not be generalizable to other contexts or populations, limiting the generalizability of the findings.

About the author

Muhammad Hassan

Researcher, Academic Writer, Web developer

Delimitations in Research – Types, Examples and...

Research Process – Steps, Examples and Tips

Research Design – Types, Methods and Examples

Institutional Review Board – Application Sample...

Evaluating Research – Process, Examples and...

Research Questions – Types, Examples and Writing...

Research Data Lifecycle Guide

Developing a Data Management Plan

This section breaks down different topics required for the planning and preparation of data used in research at Case Western Reserve University. In this phase you should understand the research being conducted, the type and methods used for collecting data, the methods used to prepare and analyze the data, addressing budgets and resources required, and have a sound understanding of how you will manage data activities during your research project.

Many federal sponsors of Case Western Reserve funded research have required data sharing plans in research proposals since 2003. As of Jan. 25, 2023, the National Institutes of Health has revised its data management and sharing requirements.

This website is designed to provide basic information and best practices to seasoned and new investigators as well as detailed guidance for adhering to the revised NIH policy.

Basics of Research Data Management

What is research data management?

Research data management (RDM) comprises a set of best practices that include file organization, documentation, storage, backup, security, preservation, and sharing, which affords researchers the ability to more quickly, efficiently, and accurately find, access, and understand their own or others' research data.

Why should you care about research data management?

RDM practices, if applied consistently and as early in a project as possible, can save you considerable time and effort later, when specific data are needed, when others need to make sense of your data, or when you decide to share or otherwise upload your data to a digital repository. Adopting RDM practices will also help you more easily comply with the data management plan (DMP) required for obtaining grants from many funding agencies and institutions.

Does data need to be retained after a project is completed?

Research data must be retained in sufficient detail and for an adequate period of time to enable appropriate responses to questions about accuracy, authenticity, primacy and compliance with laws and regulations governing the conduct of the research. External funding agencies will each have different requirements regarding storage, retention, and availability of research data. Please carefully review your award or agreement for the disposition of data requirements and data retention policies.

A good data management plan begins by understanding the sponsor requirements funding your research. As a principal investigator (PI) it is your responsibility to be knowledgeable of sponsors requirements. The Data Management Plan Tool (DMPTool) has been designed to help PIs adhere to sponsor requirements efficiently and effectively. It is strongly recommended that you take advantage of the DMPTool.

CWRU has an institutional account with DMPTool that enables users to access all of its resources via your Single Sign On credentials. CWRU's DMPTool account is supported by members of the Digital Scholarship team with the Freedman Center for Digital Scholarship. Please use the RDM Intake Request form to schedule a consultation if you would like support or guidance regarding developing a Data Management Plan.

Some basic steps to get started:

Sign into the DMPTool site to start creating a DMP for managing and sharing your data.
On the DMPTool site, you can find the most up to date templates for creating a DMP for a long list of funders, including the NIH, NEH, NSF, and more.
Explore sample DMPs to see examples of successful plans .

Be sure that your DMP is addressing any and all federal and/or funder requirements and associated DMP templates that may apply to your project. It is strongly recommended that investigators submitting proposals to the NIH utilize this tool.

The NIH is mandating Data Management and Sharing Plans for all proposals submitted after Jan. 25, 2023. Guidance for completing a NIH Data Management Plan has its own dedicated content to provide investigators detailed guidance on development of these plans for inclusion in proposals.

A Data Management Plan can help create and maintain reliable data and promote project success. DMPs, when carefully constructed and reliably adhered to, help guide elements of your research and data organization.

A DMP can help you:

Document your process and data.

Maintain a file with information on researchers and collaborators and their roles, sponsors/funding sources, methods/techniques/protocols/standards used, instrumentation, software (w/versions), references used, any applicable restrictions on its distribution or use.
Establish how you will document file changes, name changes, dates of changes, etc. Where will you record of these changes? Try to keep this sort of information in a plain text file located in the same folder as the files to which it pertains.
How are derived data products created? A DMP encourages consistent description of data processing performed, software (including version number) used, and analyses applied to data.
Establish regular forms or templates for data collection. This helps reduce gaps in your data, promotes consistency throughout the project.

Explain your data

From the outset, consider why your data were collected, what the known and expected conditions may be for collection, and information such as time and place, resolution, and standards of data collected.
What attributes, fields, or parameters will be studied and included in your data files? Identify and describe these in each file that employs them.
For an overview of data dictionaries, see the USGS page here: https://www.usgs.gov/products/data-and-tools/data-management/data-dictionaries

DMP Requirements

Why are you being asked to include a data management plan (DMP) in your grant application? For grants awarded by US governmental agencies, two federal memos from the US Office of Science and Technology Policy (OSTP), issued in 2013 and 2015 , respectively, have prompted this requirement. These memos mandate public access to federally- (and, thus, taxpayer-) funded research results, reflecting a commitment by the government to greater accountability and transparency. While "results" generally refers to the publications and reports produced from a research project, it is increasingly used to refer to the resulting data as well.

Federal research-funding agencies have responded to the OSTP memos by issuing their own guidelines and requirements for grant applicants (see below), specifying whether and how research data in particular are to be managed in order to be publicly and properly accessible.

NSF—National Science Foundation "Proposals submitted or due on or after January 18, 2011, must include a supplementary document of no more than two pages labeled 'Data Management Plan'. This supplementary document should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results." Note: Additional requirements may apply per Directorate, Office, Division, Program, or other NSF unit.
NIH—National Institutes of Health "To facilitate data sharing, investigators submitting a research application requesting $500,000 or more of direct costs in any single year to NIH on or after October 1, 2003 are expected to include a plan for sharing final research data for research purposes, or state why data sharing is not possible."
NASA—National Aeronautics and Space Administration "The purpose of a Data Management Plan (DMP) is to address the management of data from Earth science missions, from the time of their data collection/observation, to their entry into permanent archives."
DOD—Department of Defense "A Data Management Plan (DMP) describing the scientific data expected to be created or gathered in the course of a research project must be submitted to DTIC at the start of each research effort. It is important that DoD researchers document plans for preserving data at the outset, keeping in mind the potential utility of the data for future research or to support transition to operational or other environments. Otherwise, the data is lost as researchers move on to other efforts. The essential descriptive elements of the DMP are listed in section 3 of DoDI 3200.12, although the format of the plan may be adjusted to conform to standards established by the relevant scientific discipline or one that meets the requirements of the responsible Component"
Department of Education "The purpose of this document is to describe the implementation of this policy on public access to data and to provide guidance to applicants for preparing the Data Management Plan (DMP) that must outline data sharing and be submitted with the grant application. The DMP should describe a plan to provide discoverable and citable dataset(s) with sufficient documentation to support responsible use by other researchers, and should address four interrelated concerns—access, permissions, documentation, and resources—which must be considered in the earliest stages of planning for the grant."
" Office of Scientific and Technical Information (OSTI) Provides access to free, publicly-available research sponsored by the Department of Energy (DOE), including technical reports, bibliographic citations, journal articles, conference papers, books, multimedia, software, and data.

Data Management Best Practices

As you plan to collect data for research, keep in mind the following best practices.

Keep Your Data Accessible to You

Store your temporary working files somewhere easily accessible, like on a local hard drive or shared server.
While cloud storage is a convenient solution for storage and sharing, there are often concerns about data privacy and preservation. Be sure to only put data in the cloud that you are comfortable with and that your funding and/or departmental requirements allow.
For long-term storage, data should be put into preservation systems that are well-managed. [U]Tech provides several long-term data storage options for cloud and campus.
Don't keep your original data on a thumb drive or portable hard drive, as it can be easily lost or stolen.
Think about file formats that have a long life and that are readable by many programs. Formats like ascii, .txt, .csv, .pdf are great for long term preservation.
A DMP is not a replacement for good data management practices, but it can set you on the right path if it is consistently followed. Consistently revisit your plan to ensure you are following it and adhering to funder requirements.

Preservation

Know the difference between storing and preserving your data. True preservation is the ongoing process of making sure your data are secure and accessible for future generations. Many sponsors have preferred or recommended data repositories. The DMP tool can help you identify these preferred repositories.
Identify data with long-term value. Preserve the raw data and any intermediate/derived products that are expensive to reproduce or can be directly used for analysis. Preserve any scripted code that was used to clean and transform the data.
Whenever converting your data from one format to another, keep a copy of the original file and format to avoid loss or corruption of your important files.
Leverage online platforms like OSF can help your group organize, version, share, and preserve your data, if the sponsor hasn’t specified a specific platform.
Adhere to federal sponsor requirements on utilizing accepted data repositories (NIH dbGaP, NIH SRA, NIH CRDC, etc.) for preservation.

Backup, Backup, Backup

The general rule is to keep 3 copies of your data: 2 copies onsite, 1 offsite.
Backup your data regularly and frequently - automate the process if possible. This may mean weekly duplication of your working files to a separate drive, syncing your folders to a cloud service like Box, or dedicating a block of time every week to ensure you've copied everything to another location.

Organization

Establish a consistent, descriptive filing system that is intelligible to future researchers and does not rely on your own inside knowledge of your research.
A descriptive directory and file-naming structure should guide users through the contents to help them find whatever they are looking for.

Naming Conventions

Use consistent, descriptive filenames that reliably indicate the contents of the file.
If your discipline requires or recommends particular naming conventions, use them!
Do not use spaces between words. Use either camelcase or underscores to separate words
Include LastnameFirstname descriptors where appropriate.
Avoid using MM-DD-YYYY formats
Do not append vague descriptors like "latest" or "final" to your file versions. Instead, append the version's date or a consistently iterated version number.

Clean Your Data

Mistakes happen, and often researchers don't notice at first. If you are manually entering data, be sure to double-check the entries for consistency and duplication. Often having a fresh set of eyes will help to catch errors before they become problems.
Tabular data can often be error checked by sorting the fields alphanumerically to catch simple typos, extra spaces, or otherwise extreme outliers. Be sure to save your data before sorting it to ensure you do not disrupt the records!
Programs like OpenRefine are useful for checking for consistency in coding for records and variables, catching missing values, transforming data, and much more.

What should you do if you need assistance implementing RDM practices?

Whether it's because you need discipline-specific metadata standards for your data, help with securing sensitive data, or assistance writing a data management plan for a grant, help is available to you at CWRU. In addition to consulting the resources featured in this guide, you are encouraged to contact your department's liaison librarian.

If you are planning to submit a research proposal and need assistance with budgeting for data storage and or applications used to capture, manage, and or process data UTech provides information and assistance including resource boilerplates that list what centralized resources are available.

More specific guidance for including a budget for Data Management and Sharing is included on this document: Budgeting for Data Management and Sharing .

Custody of Research Data

The PI is the custodian of research data, unless agreed on in writing otherwise and the agreement is on file with the University, and is responsible for the collection, management, and retention of research data. The PI should adopt an orderly system of data organization and should communicate the chosen system to all members of a research group and to the appropriate administrative personnel, where applicable. Particularly for long-term research projects, the PI should establish and maintain procedures for the protection and management of essential records.

CWRU Custody of Research Data Policy

Data Sharing

Many funding agencies require data to be shared for the purposes of reproducibility and other important scientific goals. It is important to plan for the timely release and sharing of final research data for use by other researchers. The final release of data should be included as a key deliverable of the DMP. Knowledge of the discipline-specific database, data repository, data enclave, or archive store used to disseminate the data should also be documented as needed.

The NIH is mandating Data Management and Sharing Plans for all proposals submitted after Jan. 25, 2023. Guidance for completing a NIH Data Management and Sharing Plan has its own dedicated content to provide investigators detailed guidance on development of these plans for inclusion in proposals.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
My Account Login
Explore content
About the journal
Publish with us
Sign up for alerts
Data Descriptor
Open access
Published: 03 May 2024

A dataset for measuring the impact of research data and their curation

Libby Hemphill ORCID: orcid.org/0000-0002-3793-7281 1 , 2 ,
Andrea Thomer 3 ,
Sara Lafia 1 ,
Lizhou Fan 2 ,
David Bleckley ORCID: orcid.org/0000-0001-7715-4348 1 &
Elizabeth Moss 1

Scientific Data volume 11 , Article number: 442 ( 2024 ) Cite this article

686 Accesses

8 Altmetric

Metrics details

Research data
Social sciences

Science funders, publishers, and data archives make decisions about how to responsibly allocate resources to maximize the reuse potential of research data. This paper introduces a dataset developed to measure the impact of archival and data curation decisions on data reuse. The dataset describes 10,605 social science research datasets, their curation histories, and reuse contexts in 94,755 publications that cover 59 years from 1963 to 2022. The dataset was constructed from study-level metadata, citing publications, and curation records available through the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan. The dataset includes information about study-level attributes (e.g., PIs, funders, subject terms); usage statistics (e.g., downloads, citations); archiving decisions (e.g., curation activities, data transformations); and bibliometric attributes (e.g., journals, authors) for citing publications. This dataset provides information on factors that contribute to long-term data reuse, which can inform the design of effective evidence-based recommendations to support high-impact research data curation decisions.

SciSciNet: A large-scale open data lake for the science of science research

Data, measurement and empirical methods in the science of science

Interdisciplinarity revisited: evidence for research impact and dynamism

Background & summary.

Recent policy changes in funding agencies and academic journals have increased data sharing among researchers and between researchers and the public. Data sharing advances science and provides the transparency necessary for evaluating, replicating, and verifying results. However, many data-sharing policies do not explain what constitutes an appropriate dataset for archiving or how to determine the value of datasets to secondary users 1 , 2 , 3 . Questions about how to allocate data-sharing resources efficiently and responsibly have gone unanswered 4 , 5 , 6 . For instance, data-sharing policies recognize that not all data should be curated and preserved, but they do not articulate metrics or guidelines for determining what data are most worthy of investment.

Despite the potential for innovation and advancement that data sharing holds, the best strategies to prioritize datasets for preparation and archiving are often unclear. Some datasets are likely to have more downstream potential than others, and data curation policies and workflows should prioritize high-value data instead of being one-size-fits-all. Though prior research in library and information science has shown that the “analytic potential” of a dataset is key to its reuse value 7 , work is needed to implement conceptual data reuse frameworks 8 , 9 , 10 , 11 , 12 , 13 , 14 . In addition, publishers and data archives need guidance to develop metrics and evaluation strategies to assess the impact of datasets.

Several existing resources have been compiled to study the relationship between the reuse of scholarly products, such as datasets (Table 1 ); however, none of these resources include explicit information on how curation processes are applied to data to increase their value, maximize their accessibility, and ensure their long-term preservation. The CCex (Curation Costs Exchange) provides models of curation services along with cost-related datasets shared by contributors but does not make explicit connections between them or include reuse information 15 . Analyses on platforms such as DataCite 16 have focused on metadata completeness and record usage, but have not included related curation-level information. Analyses of GenBank 17 and FigShare 18 , 19 citation networks do not include curation information. Related studies of Github repository reuse 20 and Softcite software citation 21 reveal significant factors that impact the reuse of secondary research products but do not focus on research data. RD-Switchboard 22 and DSKG 23 are scholarly knowledge graphs linking research data to articles, patents, and grants, but largely omit social science research data and do not include curation-level factors. To our knowledge, other studies of curation work in organizations similar to ICPSR – such as GESIS 24 , Dataverse 25 , and DANS 26 – have not made their underlying data available for analysis.

This paper describes a dataset 27 compiled for the MICA project (Measuring the Impact of Curation Actions) led by investigators at ICPSR, a large social science data archive at the University of Michigan. The dataset was originally developed to study the impacts of data curation and archiving on data reuse. The MICA dataset has supported several previous publications investigating the intensity of data curation actions 28 , the relationship between data curation actions and data reuse 29 , and the structures of research communities in a data citation network 30 . Collectively, these studies help explain the return on various types of curatorial investments. The dataset that we introduce in this paper, which we refer to as the MICA dataset, has the potential to address research questions in the areas of science (e.g., knowledge production), library and information science (e.g., scholarly communication), and data archiving (e.g., reproducible workflows).

We constructed the MICA dataset 27 using records available at ICPSR, a large social science data archive at the University of Michigan. Data set creation involved: collecting and enriching metadata for articles indexed in the ICPSR Bibliography of Data-related Literature against the Dimensions AI bibliometric database; gathering usage statistics for studies from ICPSR’s administrative database; processing data curation work logs from ICPSR’s project tracking platform, Jira; and linking data in social science studies and series to citing analysis papers (Fig. 1 ).

Steps to prepare MICA dataset for analysis - external sources are red, primary internal sources are blue, and internal linked sources are green.

Enrich paper metadata

The ICPSR Bibliography of Data-related Literature is a growing database of literature in which data from ICPSR studies have been used. Its creation was funded by the National Science Foundation (Award 9977984), and for the past 20 years it has been supported by ICPSR membership and multiple US federally-funded and foundation-funded topical archives at ICPSR. The Bibliography was originally launched in the year 2000 to aid in data discovery by providing a searchable database linking publications to the study data used in them. The Bibliography collects the universe of output based on the data shared in each study through, which is made available through each ICPSR study’s webpage. The Bibliography contains both peer-reviewed and grey literature, which provides evidence for measuring the impact of research data. For an item to be included in the ICPSR Bibliography, it must contain an analysis of data archived by ICPSR or contain a discussion or critique of the data collection process, study design, or methodology 31 . The Bibliography is manually curated by a team of librarians and information specialists at ICPSR who enter and validate entries. Some publications are supplied to the Bibliography by data depositors, and some citations are submitted to the Bibliography by authors who abide by ICPSR’s terms of use requiring them to submit citations to works in which they analyzed data retrieved from ICPSR. Most of the Bibliography is populated by Bibliography team members, who create custom queries for ICPSR studies performed across numerous sources, including Google Scholar, ProQuest, SSRN, and others. Each record in the Bibliography is one publication that has used one or more ICPSR studies. The version we used was captured on 2021-11-16 and included 94,755 publications.

To expand the coverage of the ICPSR Bibliography, we searched exhaustively for all ICPSR study names, unique numbers assigned to ICPSR studies, and DOIs 32 using a full-text index available through the Dimensions AI database 33 . We accessed Dimensions through a license agreement with the University of Michigan. ICPSR Bibliography librarians and information specialists manually reviewed and validated new entries that matched one or more search criteria. We then used Dimensions to gather enriched metadata and full-text links for items in the Bibliography with DOIs. We matched 43% of the items in the Bibliography to enriched Dimensions metadata including abstracts, field of research codes, concepts, and authors’ institutional information; we also obtained links to full text for 16% of Bibliography items. Based on licensing agreements, we included Dimensions identifiers and links to full text so that users with valid publisher and database access can construct an enriched publication dataset.

Gather study usage data

ICPSR maintains a relational administrative database, DBInfo, that organizes study-level metadata and information on data reuse across separate tables. Studies at ICPSR consist of one or more files collected at a single time or for a single purpose; studies in which the same variables are observed over time are grouped into series. Each study at ICPSR is assigned a DOI, and its metadata are stored in DBInfo. Study metadata follows the Data Documentation Initiative (DDI) Codebook 2.5 standard. DDI elements included in our dataset are title, ICPSR study identification number, DOI, authoring entities, description (abstract), funding agencies, subject terms assigned to the study during curation, and geographic coverage. We also created variables based on DDI elements: total variable count, the presence of survey question text in the metadata, the number of author entities, and whether an author entity was an institution. We gathered metadata for ICPSR’s 10,605 unrestricted public-use studies available as of 2021-11-16 ( https://www.icpsr.umich.edu/web/pages/membership/or/metadata/oai.html ).

To link study usage data with study-level metadata records, we joined study metadata from DBinfo on study usage information, which included total study downloads (data and documentation), individual data file downloads, and cumulative citations from the ICPSR Bibliography. We also gathered descriptive metadata for each study and its variables, which allowed us to summarize and append recoded fields onto the study-level metadata such as curation level, number and type of principle investigators, total variable count, and binary variables indicating whether the study data were made available for online analysis, whether survey question text was made searchable online, and whether the study variables were indexed for search. These characteristics describe aspects of the discoverability of the data to compare with other characteristics of the study. We used the study and series numbers included in the ICPSR Bibliography as unique identifiers to link papers to metadata and analyze the community structure of dataset co-citations in the ICPSR Bibliography 32 .

Process curation work logs

Researchers deposit data at ICPSR for curation and long-term preservation. Between 2016 and 2020, more than 3,000 research studies were deposited with ICPSR. Since 2017, ICPSR has organized curation work into a central unit that provides varied levels of curation that vary in the intensity and complexity of data enhancement that they provide. While the levels of curation are standardized as to effort (level one = less effort, level three = most effort), the specific curatorial actions undertaken for each dataset vary. The specific curation actions are captured in Jira, a work tracking program, which data curators at ICPSR use to collaborate and communicate their progress through tickets. We obtained access to a corpus of 669 completed Jira tickets corresponding to the curation of 566 unique studies between February 2017 and December 2019 28 .

To process the tickets, we focused only on their work log portions, which contained free text descriptions of work that data curators had performed on a deposited study, along with the curators’ identifiers, and timestamps. To protect the confidentiality of the data curators and the processing steps they performed, we collaborated with ICPSR’s curation unit to propose a classification scheme, which we used to train a Naive Bayes classifier and label curation actions in each work log sentence. The eight curation action labels we proposed 28 were: (1) initial review and planning, (2) data transformation, (3) metadata, (4) documentation, (5) quality checks, (6) communication, (7) other, and (8) non-curation work. We note that these categories of curation work are very specific to the curatorial processes and types of data stored at ICPSR, and may not match the curation activities at other repositories. After applying the classifier to the work log sentences, we obtained summary-level curation actions for a subset of all ICPSR studies (5%), along with the total number of hours spent on data curation for each study, and the proportion of time associated with each action during curation.

Data Records

The MICA dataset 27 connects records for each of ICPSR’s archived research studies to the research publications that use them and related curation activities available for a subset of studies (Fig. 2 ). Each of the three tables published in the dataset is available as a study archived at ICPSR. The data tables are distributed as statistical files available for use in SAS, SPSS, Stata, and R as well as delimited and ASCII text files. The dataset is organized around studies and papers as primary entities. The studies table lists ICPSR studies, their metadata attributes, and usage information; the papers table was constructed using the ICPSR Bibliography and Dimensions database; and the curation logs table summarizes the data curation steps performed on a subset of ICPSR studies.

Studies (“ICPSR_STUDIES”): 10,605 social science research datasets available through ICPSR up to 2021-11-16 with variables for ICPSR study number, digital object identifier, study name, series number, series title, authoring entities, full-text description, release date, funding agency, geographic coverage, subject terms, topical archive, curation level, single principal investigator (PI), institutional PI, the total number of PIs, total variables in data files, question text availability, study variable indexing, level of restriction, total unique users downloading study data files and codebooks, total unique users downloading data only, and total unique papers citing data through November 2021. Studies map to the papers and curation logs table through ICPSR study numbers as “STUDY”. However, not every study in this table will have records in the papers and curation logs tables.

Papers (“ICPSR_PAPERS”): 94,755 publications collected from 2000-08-11 to 2021-11-16 in the ICPSR Bibliography and enriched with metadata from the Dimensions database with variables for paper number, identifier, title, authors, publication venue, item type, publication date, input date, ICPSR series numbers used in the paper, ICPSR study numbers used in the paper, the Dimension identifier, and the Dimensions link to the publication’s full text. Papers map to the studies table through ICPSR study numbers in the “STUDY_NUMS” field. Each record represents a single publication, and because a researcher can use multiple datasets when creating a publication, each record may list multiple studies or series.

Curation logs (“ICPSR_CURATION_LOGS”): 649 curation logs for 563 ICPSR studies (although most studies in the subset had one curation log, some studies were associated with multiple logs, with a maximum of 10) curated between February 2017 and December 2019 with variables for study number, action labels assigned to work description sentences using a classifier trained on ICPSR curation logs, hours of work associated with a single log entry, and total hours of work logged for the curation ticket. Curation logs map to the study and paper tables through ICPSR study numbers as “STUDY”. Each record represents a single logged action, and future users may wish to aggregate actions to the study level before joining tables.

Entity-relation diagram.

Technical Validation

We report on the reliability of the dataset’s metadata in the following subsections. To support future reuse of the dataset, curation services provided through ICPSR improved data quality by checking for missing values, adding variable labels, and creating a codebook.

All 10,605 studies available through ICPSR have a DOI and a full-text description summarizing what the study is about, the purpose of the study, the main topics covered, and the questions the PIs attempted to answer when they conducted the study. Personal names (i.e., principal investigators) and organizational names (i.e., funding agencies) are standardized against an authority list maintained by ICPSR; geographic names and subject terms are also standardized and hierarchically indexed in the ICPSR Thesaurus 34 . Many of ICPSR’s studies (63%) are in a series and are distributed through the ICPSR General Archive (56%), a non-topical archive that accepts any social or behavioral science data. While study data have been available through ICPSR since 1962, the earliest digital release date recorded for a study was 1984-03-18, when ICPSR’s database was first employed, and the most recent date is 2021-10-28 when the dataset was collected.

Curation level information was recorded starting in 2017 and is available for 1,125 studies (11%); approximately 80% of studies with assigned curation levels received curation services, equally distributed between Levels 1 (least intensive), 2 (moderately intensive), and 3 (most intensive) (Fig. 3 ). Detailed descriptions of ICPSR’s curation levels are available online 35 . Additional metadata are available for a subset of 421 studies (4%), including information about whether the study has a single PI, an institutional PI, the total number of PIs involved, total variables recorded is available for online analysis, has searchable question text, has variables that are indexed for search, contains one or more restricted files, and whether the study is completely restricted. We provided additional metadata for this subset of ICPSR studies because they were released within the past five years and detailed curation and usage information were available for them. Usage statistics including total downloads and data file downloads are available for this subset of studies as well; citation statistics are available for 8,030 studies (76%). Most ICPSR studies have fewer than 500 users, as indicated by total downloads, or citations (Fig. 4 ).

ICPSR study curation levels.

ICPSR study usage.

A subset of 43,102 publications (45%) available in the ICPSR Bibliography had a DOI. Author metadata were entered as free text, meaning that variations may exist and require additional normalization and pre-processing prior to analysis. While author information is standardized for each publication, individual names may appear in different sort orders (e.g., “Earls, Felton J.” and “Stephen W. Raudenbush”). Most of the items in the ICPSR Bibliography as of 2021-11-16 were journal articles (59%), reports (14%), conference presentations (9%), or theses (8%) (Fig. 5 ). The number of publications collected in the Bibliography has increased each decade since the inception of ICPSR in 1962 (Fig. 6 ). Most ICPSR studies (76%) have one or more citations in a publication.

ICPSR Bibliography citation types.

ICPSR citations by decade.

Usage Notes

The dataset consists of three tables that can be joined using the “STUDY” key as shown in Fig. 2 . The “ICPSR_PAPERS” table contains one row per paper with one or more cited studies in the “STUDY_NUMS” column. We manipulated and analyzed the tables as CSV files with the Pandas library 36 in Python and the Tidyverse packages 37 in R.

The present MICA dataset can be used independently to study the relationship between curation decisions and data reuse. Evidence of reuse for specific studies is available in several forms: usage information, including downloads and citation counts; and citation contexts within papers that cite data. Analysis may also be performed on the citation network formed between datasets and papers that use them. Finally, curation actions can be associated with properties of studies and usage histories.

This dataset has several limitations of which users should be aware. First, Jira tickets can only be used to represent the intensiveness of curation for activities undertaken since 2017, when ICPSR started using both Curation Levels and Jira. Studies published before 2017 were all curated, but documentation of the extent of that curation was not standardized and therefore could not be included in these analyses. Second, the measure of publications relies upon the authors’ clarity of data citation and the ICPSR Bibliography staff’s ability to discover citations with varying formality and clarity. Thus, there is always a chance that some secondary-data-citing publications have been left out of the bibliography. Finally, there may be some cases in which a paper in the ICSPSR bibliography did not actually obtain data from ICPSR. For example, PIs have often written about or even distributed their data prior to their archival in ICSPR. Therefore, those publications would not have cited ICPSR but they are still collected in the Bibliography as being directly related to the data that were eventually deposited at ICPSR.

In summary, the MICA dataset contains relationships between two main types of entities – papers and studies – which can be mined. The tables in the MICA dataset have supported network analysis (community structure and clique detection) 30 ; natural language processing (NER for dataset reference detection) 32 ; visualizing citation networks (to search for datasets) 38 ; and regression analysis (on curation decisions and data downloads) 29 . The data are currently being used to develop research metrics and recommendation systems for research data. Given that DOIs are provided for ICPSR studies and articles in the ICPSR Bibliography, the MICA dataset can also be used with other bibliometric databases, including DataCite, Crossref, OpenAlex, and related indexes. Subscription-based services, such as Dimensions AI, are also compatible with the MICA dataset. In some cases, these services provide abstracts or full text for papers from which data citation contexts can be extracted for semantic content analysis.

Code availability

The code 27 used to produce the MICA project dataset is available on GitHub at https://github.com/ICPSR/mica-data-descriptor and through Zenodo with the identifier https://doi.org/10.5281/zenodo.8432666 . Data manipulation and pre-processing were performed in Python. Data curation for distribution was performed in SPSS.

He, L. & Han, Z. Do usage counts of scientific data make sense? An investigation of the Dryad repository. Library Hi Tech 35 , 332–342 (2017).

Article Google Scholar

Brickley, D., Burgess, M. & Noy, N. Google dataset search: Building a search engine for datasets in an open web ecosystem. In The World Wide Web Conference - WWW ‘19 , 1365–1375 (ACM Press, San Francisco, CA, USA, 2019).

Buneman, P., Dosso, D., Lissandrini, M. & Silvello, G. Data citation and the citation graph. Quantitative Science Studies 2 , 1399–1422 (2022).

Chao, T. C. Disciplinary reach: Investigating the impact of dataset reuse in the earth sciences. Proceedings of the American Society for Information Science and Technology 48 , 1–8 (2011).

Article ADS Google Scholar

Parr, C. et al . A discussion of value metrics for data repositories in earth and environmental sciences. Data Science Journal 18 , 58 (2019).

Eschenfelder, K. R., Shankar, K. & Downey, G. The financial maintenance of social science data archives: Four case studies of long–term infrastructure work. J. Assoc. Inf. Sci. Technol. 73 , 1723–1740 (2022).

Palmer, C. L., Weber, N. M. & Cragin, M. H. The analytic potential of scientific data: Understanding re-use value. Proceedings of the American Society for Information Science and Technology 48 , 1–10 (2011).

Zimmerman, A. S. New knowledge from old data: The role of standards in the sharing and reuse of ecological data. Sci. Technol. Human Values 33 , 631–652 (2008).

Cragin, M. H., Palmer, C. L., Carlson, J. R. & Witt, M. Data sharing, small science and institutional repositories. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 368 , 4023–4038 (2010).

Article ADS CAS Google Scholar

Fear, K. M. Measuring and Anticipating the Impact of Data Reuse . Ph.D. thesis, University of Michigan (2013).

Borgman, C. L., Van de Sompel, H., Scharnhorst, A., van den Berg, H. & Treloar, A. Who uses the digital data archive? An exploratory study of DANS. Proceedings of the Association for Information Science and Technology 52 , 1–4 (2015).

Pasquetto, I. V., Borgman, C. L. & Wofford, M. F. Uses and reuses of scientific data: The data creators’ advantage. Harvard Data Science Review 1 (2019).

Gregory, K., Groth, P., Scharnhorst, A. & Wyatt, S. Lost or found? Discovering data needed for research. Harvard Data Science Review (2020).

York, J. Seeking equilibrium in data reuse: A study of knowledge satisficing . Ph.D. thesis, University of Michigan (2022).

Kilbride, W. & Norris, S. Collaborating to clarify the cost of curation. New Review of Information Networking 19 , 44–48 (2014).

Robinson-Garcia, N., Mongeon, P., Jeng, W. & Costas, R. DataCite as a novel bibliometric source: Coverage, strengths and limitations. Journal of Informetrics 11 , 841–854 (2017).

Qin, J., Hemsley, J. & Bratt, S. E. The structural shift and collaboration capacity in GenBank networks: A longitudinal study. Quantitative Science Studies 3 , 174–193 (2022).

Article PubMed PubMed Central Google Scholar

Acuna, D. E., Yi, Z., Liang, L. & Zhuang, H. Predicting the usage of scientific datasets based on article, author, institution, and journal bibliometrics. In Smits, M. (ed.) Information for a Better World: Shaping the Global Future. iConference 2022 ., 42–52 (Springer International Publishing, Cham, 2022).

Zeng, T., Wu, L., Bratt, S. & Acuna, D. E. Assigning credit to scientific datasets using article citation networks. Journal of Informetrics 14 , 101013 (2020).

Koesten, L., Vougiouklis, P., Simperl, E. & Groth, P. Dataset reuse: Toward translating principles to practice. Patterns 1 , 100136 (2020).

Du, C., Cohoon, J., Lopez, P. & Howison, J. Softcite dataset: A dataset of software mentions in biomedical and economic research publications. J. Assoc. Inf. Sci. Technol. 72 , 870–884 (2021).

Aryani, A. et al . A research graph dataset for connecting research data repositories using RD-Switchboard. Sci Data 5 , 180099 (2018).

Färber, M. & Lamprecht, D. The data set knowledge graph: Creating a linked open data source for data sets. Quantitative Science Studies 2 , 1324–1355 (2021).

Perry, A. & Netscher, S. Measuring the time spent on data curation. Journal of Documentation 78 , 282–304 (2022).

Trisovic, A. et al . Advancing computational reproducibility in the Dataverse data repository platform. In Proceedings of the 3rd International Workshop on Practical Reproducible Evaluation of Computer Systems , P-RECS ‘20, 15–20, https://doi.org/10.1145/3391800.3398173 (Association for Computing Machinery, New York, NY, USA, 2020).

Borgman, C. L., Scharnhorst, A. & Golshan, M. S. Digital data archives as knowledge infrastructures: Mediating data sharing and reuse. Journal of the Association for Information Science and Technology 70 , 888–904, https://doi.org/10.1002/asi.24172 (2019).

Lafia, S. et al . MICA Data Descriptor. Zenodo https://doi.org/10.5281/zenodo.8432666 (2023).

Lafia, S., Thomer, A., Bleckley, D., Akmon, D. & Hemphill, L. Leveraging machine learning to detect data curation activities. In 2021 IEEE 17th International Conference on eScience (eScience) , 149–158, https://doi.org/10.1109/eScience51609.2021.00025 (2021).

Hemphill, L., Pienta, A., Lafia, S., Akmon, D. & Bleckley, D. How do properties of data, their curation, and their funding relate to reuse? J. Assoc. Inf. Sci. Technol. 73 , 1432–44, https://doi.org/10.1002/asi.24646 (2021).

Lafia, S., Fan, L., Thomer, A. & Hemphill, L. Subdivisions and crossroads: Identifying hidden community structures in a data archive’s citation network. Quantitative Science Studies 3 , 694–714, https://doi.org/10.1162/qss_a_00209 (2022).

ICPSR. ICPSR Bibliography of Data-related Literature: Collection Criteria. https://www.icpsr.umich.edu/web/pages/ICPSR/citations/collection-criteria.html (2023).

Lafia, S., Fan, L. & Hemphill, L. A natural language processing pipeline for detecting informal data references in academic literature. Proc. Assoc. Inf. Sci. Technol. 59 , 169–178, https://doi.org/10.1002/pra2.614 (2022).

Hook, D. W., Porter, S. J. & Herzog, C. Dimensions: Building context for search and evaluation. Frontiers in Research Metrics and Analytics 3 , 23, https://doi.org/10.3389/frma.2018.00023 (2018).

https://www.icpsr.umich.edu/web/ICPSR/thesaurus (2002). ICPSR. ICPSR Thesaurus.

https://www.icpsr.umich.edu/files/datamanagement/icpsr-curation-levels.pdf (2020). ICPSR. ICPSR Curation Levels.

McKinney, W. Data Structures for Statistical Computing in Python. In van der Walt, S. & Millman, J. (eds.) Proceedings of the 9th Python in Science Conference , 56–61 (2010).

Wickham, H. et al . Welcome to the Tidyverse. Journal of Open Source Software 4 , 1686 (2019).

Fan, L., Lafia, S., Li, L., Yang, F. & Hemphill, L. DataChat: Prototyping a conversational agent for dataset search and visualization. Proc. Assoc. Inf. Sci. Technol. 60 , 586–591 (2023).

Download references

Acknowledgements

We thank the ICPSR Bibliography staff, the ICPSR Data Curation Unit, and the ICPSR Data Stewardship Committee for their support of this research. This material is based upon work supported by the National Science Foundation under grant 1930645. This project was made possible in part by the Institute of Museum and Library Services LG-37-19-0134-19.

Author information

Authors and affiliations.

Inter-university Consortium for Political and Social Research, University of Michigan, Ann Arbor, MI, 48104, USA

Libby Hemphill, Sara Lafia, David Bleckley & Elizabeth Moss

School of Information, University of Michigan, Ann Arbor, MI, 48104, USA

Libby Hemphill & Lizhou Fan

School of Information, University of Arizona, Tucson, AZ, 85721, USA

Andrea Thomer

You can also search for this author in PubMed Google Scholar

Contributions

L.H. and A.T. conceptualized the study design, D.B., E.M., and S.L. prepared the data, S.L., L.F., and L.H. analyzed the data, and D.B. validated the data. All authors reviewed and edited the manuscript.

Corresponding author

Correspondence to Libby Hemphill .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Hemphill, L., Thomer, A., Lafia, S. et al. A dataset for measuring the impact of research data and their curation. Sci Data 11 , 442 (2024). https://doi.org/10.1038/s41597-024-03303-2

Download citation

Received : 16 November 2023

Accepted : 24 April 2024

Published : 03 May 2024

DOI : https://doi.org/10.1038/s41597-024-03303-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

Explore articles by subject
Guide to authors
Editorial policies

Liberty University
Jerry Falwell Library
Special Collections
< Previous

Home > ETD > Doctoral > 5543

Doctoral Dissertations and Projects

Virtual coaching, self-directed learning, and the implementation of evidence-based practices: a single qualitative case study.

Elisabeth Myers , Liberty University Follow

School of Education

Doctor of Philosophy in Education (PhD)

Christine Saba

virtual coaching, self-directed learning, evidence-based practices, self-determination theory, sustained implementation

Disciplines

Education | Educational Leadership

Recommended Citation

Myers, Elisabeth, "Virtual Coaching, Self-Directed Learning, and the Implementation of Evidence-Based Practices: A Single Qualitative Case Study" (2024). Doctoral Dissertations and Projects . 5543. https://digitalcommons.liberty.edu/doctoral/5543

The purpose of this single instrumental case study was to understand how a virtual coaching program provides opportunities for self-directed learning during the implementation of evidence-based practices for adults at Navigator Coaching. The theory guiding this study was Deci and Ryan’s self-determination theory as conceptualizations of self-directed learning described in the literature mirror descriptions of self-determination. The central research question was: How does a virtual coaching program provide opportunities for self-directed learning during the implementation of evidence-based practices? As a single instrumental case, the setting for this study was one virtual life-coaching program in North America. The sample of participants included 12 adults who were currently enrolled in the program for a minimum of 6 months and participated in weekly program activities. Multiple data collection methods were employed to describe and understand the case: observations, audiovisual materials, and individual interviews. Interpretational analysis and a multistep data analysis process including direct interpretation, categorical aggregation, correspondence tables, and interpretive commentaries were utilized to develop the themes and overall synthesis of the case. Opportunities for self-directed learning were provided in weekly live sessions, modules in the program library, and in the Facebook group. Program members utilized instructional opportunities to satisfy their need for autonomy, thus becoming students of self. Participation in a purposeful community that was focused on solutions provided opportunities for program members to satisfy competence and relatedness needs. Program members implemented evidence-based practices and developed skills to create weekly learning plans, which assisted them in becoming agents of their highest selves.

Included in

Educational Leadership Commons

Collections
Faculty Expert Gallery
Theses and Dissertations
Conferences and Events
Open Educational Resources (OER)
Explore Disciplines

Advanced Search

Notify me via email or RSS .

Faculty Authors

Submit Research
Expert Gallery Login

Student Authors

Undergraduate Submissions
Graduate Submissions
Honors Submissions

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright

A woman standing in a server room holding a laptop connected to a series of tall, black servers cabinets.

Published: 5 April 2024 Contributors: Tim Mucci, Cole Stryker

Big data analytics refers to the systematic processing and analysis of large amounts of data and complex data sets, known as big data, to extract valuable insights. Big data analytics allows for the uncovering of trends, patterns and correlations in large amounts of raw data to help analysts make data-informed decisions. This process allows organizations to leverage the exponentially growing data generated from diverse sources, including internet-of-things (IoT) sensors, social media, financial transactions and smart devices to derive actionable intelligence through advanced analytic techniques.

In the early 2000s, advances in software and hardware capabilities made it possible for organizations to collect and handle large amounts of unstructured data. With this explosion of useful data, open-source communities developed big data frameworks to store and process this data. These frameworks are used for distributed storage and processing of large data sets across a network of computers. Along with additional tools and libraries, big data frameworks can be used for:

Predictive modeling by incorporating artificial intelligence (AI) and statistical algorithms
Statistical analysis for in-depth data exploration and to uncover hidden patterns
What-if analysis to simulate different scenarios and explore potential outcomes
Processing diverse data sets, including structured, semi-structured and unstructured data from various sources.

Four main data analysis methods – descriptive, diagnostic, predictive and prescriptive – are used to uncover insights and patterns within an organization's data. These methods facilitate a deeper understanding of market trends, customer preferences and other important business metrics.

IBM named a Leader in the 2024 Gartner® Magic Quadrant™ for Augmented Data Quality Solutions.

Structured vs unstructured data

What is data management?

The main difference between big data analytics and traditional data analytics is the type of data handled and the tools used to analyze it. Traditional analytics deals with structured data, typically stored in relational databases . This type of database helps ensure that data is well-organized and easy for a computer to understand. Traditional data analytics relies on statistical methods and tools like structured query language (SQL) for querying databases.

Big data analytics involves massive amounts of data in various formats, including structured, semi-structured and unstructured data. The complexity of this data requires more sophisticated analysis techniques. Big data analytics employs advanced techniques like machine learning and data mining to extract information from complex data sets. It often requires distributed processing systems like Hadoop to manage the sheer volume of data.

These are the four methods of data analysis at work within big data:

The "what happened" stage of data analysis. Here, the focus is on summarizing and describing past data to understand its basic characteristics.

The “why it happened” stage. By delving deep into the data, diagnostic analysis identifies the root patterns and trends observed in descriptive analytics.

The “what will happen” stage. It uses historical data, statistical modeling and machine learning to forecast trends.

Describes the “what to do” stage, which goes beyond prediction to provide recommendations for optimizing future actions based on insights derived from all previous.

The following dimensions highlight the core challenges and opportunities inherent in big data analytics.

The sheer volume of data generated today, from social media feeds, IoT devices, transaction records and more, presents a significant challenge. Traditional data storage and processing solutions are often inadequate to handle this scale efficiently. Big data technologies and cloud-based storage solutions enable organizations to store and manage these vast data sets cost-effectively, protecting valuable data from being discarded due to storage limitations.

Data is being produced at unprecedented speeds, from real-time social media updates to high-frequency stock trading records. The velocity at which data flows into organizations requires robust processing capabilities to capture, process and deliver accurate analysis in near real-time. Stream processing frameworks and in-memory data processing are designed to handle these rapid data streams and balance supply with demand.

Today's data comes in many formats, from structured to numeric data in traditional databases to unstructured text, video and images from diverse sources like social media and video surveillance. This variety demans flexible data management systems to handle and integrate disparate data types for comprehensive analysis. NoSQL databases , data lakes and schema -on-read technologies provide the necessary flexibility to accommodate the diverse nature of big data.

Data reliability and accuracy are critical, as decisions based on inaccurate or incomplete data can lead to negative outcomes. Veracity refers to the data's trustworthiness, encompassing data quality, noise and anomaly detection issues. Techniques and tools for data cleaning, validation and verification are integral to ensuring the integrity of big data, enabling organizations to make better decisions based on reliable information.

Big data analytics aims to extract actionable insights that offer tangible value. This involves turning vast data sets into meaningful information that can inform strategic decisions, uncover new opportunities and drive innovation. Advanced analytics, machine learning and AI are key to unlocking the value contained within big data, transforming raw data into strategic assets.

Data professionals, analysts, scientists and statisticians prepare and process data in a data lakehouse, which combines the performance of a data lakehouse with the flexibility of a data lake to clean data and ensure its quality. The process of turning raw data into valuable insights encompasses several key stages:

Collect data: The first step involves gathering data, which can be a mix of structured and unstructured forms from myriad sources like cloud, mobile applications and IoT sensors. This step is where organizations adapt their data collection strategies and integrate data from varied sources into central repositories like a data lake, which can automatically assign metadata for better manageability and accessibility.
Process data: After being collected, data must be systematically organized, extracted, transformed and then loaded into a storage system to ensure accurate analytical outcomes. Processing involves converting raw data into a format that is usable for analysis, which might involve aggregating data from different sources, converting data types or organizing data into structure formats. Given the exponential growth of available data, this stage can be challenging. Processing strategies may vary between batch processing, which handles large data volumes over extended periods and stream processing, which deals with smaller real-time data batches.
Clean data: Regardless of size, data must be cleaned to ensure quality and relevance. Cleaning data involves formatting it correctly, removing duplicates and eliminating irrelevant entries. Clean data prevents the corruption of output and safeguard’s reliability and accuracy.
Analyze data: Advanced analytics, such as data mining, predictive analytics, machine learning and deep learning, are employed to sift through the processed and cleaned data. These methods allow users to discover patterns, relationships and trends within the data, providing a solid foundation for informed decision-making.

Under the Analyze umbrella, there are potentially many technologies at work, including data mining, which is used to identify patterns and relationships within large data sets; predictive analytics, which forecasts future trends and opportunities; and deep learning , which mimics human learning patterns to uncover more abstract ideas.

Deep learning uses an artificial neural network with multiple layers to model complex patterns in data. Unlike traditional machine learning algorithms, deep learning learns from images, sound and text without manual help. For big data analytics, this powerful capability means the volume and complexity of data is not an issue.

Natural language processing (NLP) models allow machines to understand, interpret and generate human language. Within big data analytics, NLP extracts insights from massive unstructured text data generated across an organization and beyond.

Structured Data

Structured data refers to highly organized information that is easily searchable and typically stored in relational databases or spreadsheets. It adheres to a rigid schema, meaning each data element is clearly defined and accessible in a fixed field within a record or file. Examples of structured data include:

Customer names and addresses in a customer relationship management (CRM) system
Transactional data in financial records, such as sales figures and account balances
Employee data in human resources databases, including job titles and salaries

Structured data's main advantage is its simplicity for entry, search and analysis, often using straightforward database queries like SQL. However, the rapidly expanding universe of big data means that structured data represents a relatively small portion of the total data available to organizations.

Unstructured Data

Unstructured data lacks a pre-defined data model, making it more difficult to collect, process and analyze. It comprises the majority of data generated today, and includes formats such as:

Textual content from documents, emails and social media posts
Multimedia content, including images, audio files and videos
Data from IoT devices, which can include a mix of sensor data, log files and time-series data

The primary challenge with unstructured data is its complexity and lack of uniformity, requiring more sophisticated methods for indexing, searching and analyzing. NLP, machine learning and advanced analytics platforms are often employed to extract meaningful insights from unstructured data.

Semi-structured data

Semi-structured data occupies the middle ground between structured and unstructured data. While it does not reside in a relational database, it contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. Examples include:

JSON (JavaScript Object Notation) and XML (eXtensible Markup Language) files, which are commonly used for web data interchange
Email, where the data has a standardized format (e.g., headers, subject, body) but the content within each section is unstructured
NoSQL databases, can store and manage semi-structured data more efficiently than traditional relational databases

Semi-structured data is more flexible than structured data but easier to analyze than unstructured data, providing a balance that is particularly useful in web applications and data integration tasks.

Ensuring data quality and integrity, integrating disparate data sources, protecting data privacy and security and finding the right talent to analyze and interpret data can present challenges to organizations looking to leverage their extensive data volumes. What follows are the benefits organizations can realize once they see success with big data analytics:

Real-time intelligence

One of the standout advantages of big data analytics is the capacity to provide real-time intelligence. Organizations can analyze vast amounts of data as it is generated from myriad sources and in various formats. Real-time insight allows businesses to make quick decisions, respond to market changes instantaneously and identify and act on opportunities as they arise.

Better-informed decisions

With big data analytics, organizations can uncover previously hidden trends, patterns and correlations. A deeper understanding equips leaders and decision-makers with the information needed to strategize effectively, enhancing business decision-making in supply chain management, e-commerce, operations and overall strategic direction.

Cost savings

Big data analytics drives cost savings by identifying business process efficiencies and optimizations. Organizations can pinpoint wasteful expenditures by analyzing large datasets, streamlining operations and enhancing productivity. Moreover, predictive analytics can forecast future trends, allowing companies to allocate resources more efficiently and avoid costly missteps.

Better customer engagement

Understanding customer needs, behaviors and sentiments is crucial for successful engagement and big data analytics provides the tools to achieve this understanding. Companies gain insights into consumer preferences and tailor their marketing strategies by analyzing customer data.

Optimized risk management strategies

Big data analytics enhances an organization's ability to manage risk by providing the tools to identify, assess and address threats in real time. Predictive analytics can foresee potential dangers before they materialize, allowing companies to devise preemptive strategies.

As organizations across industries seek to leverage data to drive decision-making, improve operational efficiencies and enhance customer experiences, the demand for skilled professionals in big data analytics has surged. Here are some prominent career paths that utilize big data analytics:

Data scientist

Data scientists analyze complex digital data to assist businesses in making decisions. Using their data science training and advanced analytics technologies, including machine learning and predictive modeling, they uncover hidden insights in data.

Data analyst

Data analysts turn data into information and information into insights. They use statistical techniques to analyze and extract meaningful trends from data sets, often to inform business strategy and decisions.

Data engineer

Data engineers prepare, process and manage big data infrastructure and tools. They also develop, maintain, test and evaluate data solutions within organizations, often working with massive datasets to assist in analytics projects.

Machine learning engineer

Machine learning engineers focus on designing and implementing machine learning applications. They develop sophisticated algorithms that learn from and make predictions on data.

Business intelligence analyst

Business intelligence (BI) analysts help businesses make data-driven decisions by analyzing data to produce actionable insights. They often use BI tools to convert data into easy-to-understand reports and visualizations for business stakeholders.

Data visualization specialist

These specialists focus on the visual representation of data. They create data visualizations that help end users understand the significance of data by placing it in a visual context.

Data architect

Data architects design, create, deploy and manage an organization's data architecture. They define how data is stored, consumed, integrated and managed by different data entities and IT systems.

IBM and Cloudera have partnered to create an industry-leading, enterprise-grade big data framework distribution plus a variety of cloud services and products — all designed to achieve faster analytics at scale.

IBM Db2 Database on IBM Cloud Pak for Data combines a proven, AI-infused, enterprise-ready data management system with an integrated data and AI platform built on the security-rich, scalable Red Hat OpenShift foundation.

IBM Big Replicate is an enterprise-class data replication software platform that keeps data consistent in a distributed environment, on-premises and in the hybrid cloud, including SQL and NoSQL databases.

A data warehouse is a system that aggregates data from different sources into a single, central, consistent data store to support data analysis, data mining, artificial intelligence and machine learning.

Business intelligence gives organizations the ability to get answers they can understand. Instead of using best guesses, they can base decisions on what their business data is telling them — whether it relates to production, supply chain, customers or market trends.

Cloud computing is the on-demand access of physical or virtual servers, data storage, networking capabilities, application development tools, software, AI analytic tools and more—over the internet with pay-per-use pricing. The cloud computing model offers customers flexibility and scalability compared to traditional infrastructure.

Purpose-built data-driven architecture helps support business intelligence across the organization. IBM analytics solutions allow organizations to simplify raw data access, provide end-to-end data management and empower business users with AI-driven self-service analytics to predict outcomes.

Artificial intelligence in strategy

Can machines automate strategy development? The short answer is no. However, there are numerous aspects of strategists’ work where AI and advanced analytics tools can already bring enormous value. Yuval Atsmon is a senior partner who leads the new McKinsey Center for Strategy Innovation, which studies ways new technologies can augment the timeless principles of strategy. In this episode of the Inside the Strategy Room podcast, he explains how artificial intelligence is already transforming strategy and what’s on the horizon. This is an edited transcript of the discussion. For more conversations on the strategy issues that matter, follow the series on your preferred podcast platform .

Joanna Pachner: What does artificial intelligence mean in the context of strategy?

Yuval Atsmon: When people talk about artificial intelligence, they include everything to do with analytics, automation, and data analysis. Marvin Minsky, the pioneer of artificial intelligence research in the 1960s, talked about AI as a “suitcase word”—a term into which you can stuff whatever you want—and that still seems to be the case. We are comfortable with that because we think companies should use all the capabilities of more traditional analysis while increasing automation in strategy that can free up management or analyst time and, gradually, introducing tools that can augment human thinking.

Joanna Pachner: AI has been embraced by many business functions, but strategy seems to be largely immune to its charms. Why do you think that is?

Subscribe to the Inside the Strategy Room podcast

Yuval Atsmon: You’re right about the limited adoption. Only 7 percent of respondents to our survey about the use of AI say they use it in strategy or even financial planning, whereas in areas like marketing, supply chain, and service operations, it’s 25 or 30 percent. One reason adoption is lagging is that strategy is one of the most integrative conceptual practices. When executives think about strategy automation, many are looking too far ahead—at AI capabilities that would decide, in place of the business leader, what the right strategy is. They are missing opportunities to use AI in the building blocks of strategy that could significantly improve outcomes.

I like to use the analogy to virtual assistants. Many of us use Alexa or Siri but very few people use these tools to do more than dictate a text message or shut off the lights. We don’t feel comfortable with the technology’s ability to understand the context in more sophisticated applications. AI in strategy is similar: it’s hard for AI to know everything an executive knows, but it can help executives with certain tasks.

When executives think about strategy automation, many are looking too far ahead—at AI deciding the right strategy. They are missing opportunities to use AI in the building blocks of strategy.

Joanna Pachner: What kind of tasks can AI help strategists execute today?

Yuval Atsmon: We talk about six stages of AI development. The earliest is simple analytics, which we refer to as descriptive intelligence. Companies use dashboards for competitive analysis or to study performance in different parts of the business that are automatically updated. Some have interactive capabilities for refinement and testing.

The second level is diagnostic intelligence, which is the ability to look backward at the business and understand root causes and drivers of performance. The level after that is predictive intelligence: being able to anticipate certain scenarios or options and the value of things in the future based on momentum from the past as well as signals picked in the market. Both diagnostics and prediction are areas that AI can greatly improve today. The tools can augment executives’ analysis and become areas where you develop capabilities. For example, on diagnostic intelligence, you can organize your portfolio into segments to understand granularly where performance is coming from and do it in a much more continuous way than analysts could. You can try 20 different ways in an hour versus deploying one hundred analysts to tackle the problem.

Predictive AI is both more difficult and more risky. Executives shouldn’t fully rely on predictive AI, but it provides another systematic viewpoint in the room. Because strategic decisions have significant consequences, a key consideration is to use AI transparently in the sense of understanding why it is making a certain prediction and what extrapolations it is making from which information. You can then assess if you trust the prediction or not. You can even use AI to track the evolution of the assumptions for that prediction.

Those are the levels available today. The next three levels will take time to develop. There are some early examples of AI advising actions for executives’ consideration that would be value-creating based on the analysis. From there, you go to delegating certain decision authority to AI, with constraints and supervision. Eventually, there is the point where fully autonomous AI analyzes and decides with no human interaction.

Because strategic decisions have significant consequences, you need to understand why AI is making a certain prediction and what extrapolations it’s making from which information.

Joanna Pachner: What kind of businesses or industries could gain the greatest benefits from embracing AI at its current level of sophistication?

Yuval Atsmon: Every business probably has some opportunity to use AI more than it does today. The first thing to look at is the availability of data. Do you have performance data that can be organized in a systematic way? Companies that have deep data on their portfolios down to business line, SKU, inventory, and raw ingredients have the biggest opportunities to use machines to gain granular insights that humans could not.

Companies whose strategies rely on a few big decisions with limited data would get less from AI. Likewise, those facing a lot of volatility and vulnerability to external events would benefit less than companies with controlled and systematic portfolios, although they could deploy AI to better predict those external events and identify what they can and cannot control.

Third, the velocity of decisions matters. Most companies develop strategies every three to five years, which then become annual budgets. If you think about strategy in that way, the role of AI is relatively limited other than potentially accelerating analyses that are inputs into the strategy. However, some companies regularly revisit big decisions they made based on assumptions about the world that may have since changed, affecting the projected ROI of initiatives. Such shifts would affect how you deploy talent and executive time, how you spend money and focus sales efforts, and AI can be valuable in guiding that. The value of AI is even bigger when you can make decisions close to the time of deploying resources, because AI can signal that your previous assumptions have changed from when you made your plan.

Joanna Pachner: Can you provide any examples of companies employing AI to address specific strategic challenges?

Yuval Atsmon: Some of the most innovative users of AI, not coincidentally, are AI- and digital-native companies. Some of these companies have seen massive benefits from AI and have increased its usage in other areas of the business. One mobility player adjusts its financial planning based on pricing patterns it observes in the market. Its business has relatively high flexibility to demand but less so to supply, so the company uses AI to continuously signal back when pricing dynamics are trending in a way that would affect profitability or where demand is rising. This allows the company to quickly react to create more capacity because its profitability is highly sensitive to keeping demand and supply in equilibrium.

Joanna Pachner: Given how quickly things change today, doesn’t AI seem to be more a tactical than a strategic tool, providing time-sensitive input on isolated elements of strategy?

Yuval Atsmon: It’s interesting that you make the distinction between strategic and tactical. Of course, every decision can be broken down into smaller ones, and where AI can be affordably used in strategy today is for building blocks of the strategy. It might feel tactical, but it can make a massive difference. One of the world’s leading investment firms, for example, has started to use AI to scan for certain patterns rather than scanning individual companies directly. AI looks for consumer mobile usage that suggests a company’s technology is catching on quickly, giving the firm an opportunity to invest in that company before others do. That created a significant strategic edge for them, even though the tool itself may be relatively tactical.

Joanna Pachner: McKinsey has written a lot about cognitive biases and social dynamics that can skew decision making. Can AI help with these challenges?

Yuval Atsmon: When we talk to executives about using AI in strategy development, the first reaction we get is, “Those are really big decisions; what if AI gets them wrong?” The first answer is that humans also get them wrong—a lot. [Amos] Tversky, [Daniel] Kahneman, and others have proven that some of those errors are systemic, observable, and predictable. The first thing AI can do is spot situations likely to give rise to biases. For example, imagine that AI is listening in on a strategy session where the CEO proposes something and everyone says “Aye” without debate and discussion. AI could inform the room, “We might have a sunflower bias here,” which could trigger more conversation and remind the CEO that it’s in their own interest to encourage some devil’s advocacy.

We also often see confirmation bias, where people focus their analysis on proving the wisdom of what they already want to do, as opposed to looking for a fact-based reality. Just having AI perform a default analysis that doesn’t aim to satisfy the boss is useful, and the team can then try to understand why that is different than the management hypothesis, triggering a much richer debate.

In terms of social dynamics, agency problems can create conflicts of interest. Every business unit [BU] leader thinks that their BU should get the most resources and will deliver the most value, or at least they feel they should advocate for their business. AI provides a neutral way based on systematic data to manage those debates. It’s also useful for executives with decision authority, since we all know that short-term pressures and the need to make the quarterly and annual numbers lead people to make different decisions on the 31st of December than they do on January 1st or October 1st. Like the story of Ulysses and the sirens, you can use AI to remind you that you wanted something different three months earlier. The CEO still decides; AI can just provide that extra nudge.

Joanna Pachner: It’s like you have Spock next to you, who is dispassionate and purely analytical.

Yuval Atsmon: That is not a bad analogy—for Star Trek fans anyway.

Joanna Pachner: Do you have a favorite application of AI in strategy?

Yuval Atsmon: I have worked a lot on resource allocation, and one of the challenges, which we call the hockey stick phenomenon, is that executives are always overly optimistic about what will happen. They know that resource allocation will inevitably be defined by what you believe about the future, not necessarily by past performance. AI can provide an objective prediction of performance starting from a default momentum case: based on everything that happened in the past and some indicators about the future, what is the forecast of performance if we do nothing? This is before we say, “But I will hire these people and develop this new product and improve my marketing”— things that every executive thinks will help them overdeliver relative to the past. The neutral momentum case, which AI can calculate in a cold, Spock-like manner, can change the dynamics of the resource allocation discussion. It’s a form of predictive intelligence accessible today and while it’s not meant to be definitive, it provides a basis for better decisions.

Joanna Pachner: Do you see access to technology talent as one of the obstacles to the adoption of AI in strategy, especially at large companies?

Yuval Atsmon: I would make a distinction. If you mean machine-learning and data science talent or software engineers who build the digital tools, they are definitely not easy to get. However, companies can increasingly use platforms that provide access to AI tools and require less from individual companies. Also, this domain of strategy is exciting—it’s cutting-edge, so it’s probably easier to get technology talent for that than it might be for manufacturing work.

The bigger challenge, ironically, is finding strategists or people with business expertise to contribute to the effort. You will not solve strategy problems with AI without the involvement of people who understand the customer experience and what you are trying to achieve. Those who know best, like senior executives, don’t have time to be product managers for the AI team. An even bigger constraint is that, in some cases, you are asking people to get involved in an initiative that may make their jobs less important. There could be plenty of opportunities for incorporating AI into existing jobs, but it’s something companies need to reflect on. The best approach may be to create a digital factory where a different team tests and builds AI applications, with oversight from senior stakeholders.

The big challenge is finding strategists to contribute to the AI effort. You are asking people to get involved in an initiative that may make their jobs less important.

Joanna Pachner: Do you think this worry about job security and the potential that AI will automate strategy is realistic?

Yuval Atsmon: The question of whether AI will replace human judgment and put humanity out of its job is a big one that I would leave for other experts.

The pertinent question is shorter-term automation. Because of its complexity, strategy would be one of the later domains to be affected by automation, but we are seeing it in many other domains. However, the trend for more than two hundred years has been that automation creates new jobs, although ones requiring different skills. That doesn’t take away the fear some people have of a machine exposing their mistakes or doing their job better than they do it.

Joanna Pachner: We recently published an article about strategic courage in an age of volatility that talked about three types of edge business leaders need to develop. One of them is an edge in insights. Do you think AI has a role to play in furnishing a proprietary insight edge?

Yuval Atsmon: One of the challenges most strategists face is the overwhelming complexity of the world we operate in—the number of unknowns, the information overload. At one level, it may seem that AI will provide another layer of complexity. In reality, it can be a sharp knife that cuts through some of the clutter. The question to ask is, Can AI simplify my life by giving me sharper, more timely insights more easily?

Joanna Pachner: You have been working in strategy for a long time. What sparked your interest in exploring this intersection of strategy and new technology?

Yuval Atsmon: I have always been intrigued by things at the boundaries of what seems possible. Science fiction writer Arthur C. Clarke’s second law is that to discover the limits of the possible, you have to venture a little past them into the impossible, and I find that particularly alluring in this arena.

AI in strategy is in very nascent stages but could be very consequential for companies and for the profession. For a top executive, strategic decisions are the biggest way to influence the business, other than maybe building the top team, and it is amazing how little technology is leveraged in that process today. It’s conceivable that competitive advantage will increasingly rest in having executives who know how to apply AI well. In some domains, like investment, that is already happening, and the difference in returns can be staggering. I find helping companies be part of that evolution very exciting.

Explore a career with us

Strategic courage in an age of volatility

Bias Busters Collection

COMMENTS

(PDF) Collecting data through case studies
The case study is a data collection method in which in-depth descriptive information. about specific entities, or cases, is collected, organized, interpreted, and presented in a. narrative format ...
Case Study Methodology of Qualitative Research: Key Attributes and
In a case study research, multiple methods of data collection are used, as it involves an in-depth study of a phenomenon. It must be noted, as highlighted by Yin , a case study is not a method of data collection, rather is a research strategy or design to study a social unit.
What is a Case Study?
A case study protocol outlines the procedures and general rules to be followed during the case study. This includes the data collection methods to be used, the sources of data, and the procedures for analysis. Having a detailed case study protocol ensures consistency and reliability in the study.
Case Study
The data collection method should be selected based on the research questions and the nature of the case study phenomenon. Analyze the data: The data collected from the case study should be analyzed using various techniques, such as content analysis, thematic analysis, or grounded theory.
Planning Qualitative Research: Design and Decision Making for New
While many books and articles guide various qualitative research methods and analyses, there is currently no concise resource that explains and differentiates among the most common qualitative approaches. We believe novice qualitative researchers, students planning the design of a qualitative study or taking an introductory qualitative research course, and faculty teaching such courses can ...
Case Study Method: A Step-by-Step Guide for Business Researchers
Although case studies have been discussed extensively in the literature, little has been written about the specific steps one may use to conduct case study research effectively (Gagnon, 2010; Hancock & Algozzine, 2016).Baskarada (2014) also emphasized the need to have a succinct guideline that can be practically followed as it is actually tough to execute a case study well in practice.
Data Collection
Data Collection | Definition, Methods & Examples. Published on June 5, 2020 by Pritha Bhandari.Revised on June 21, 2023. Data collection is a systematic process of gathering observations or measurements. Whether you are performing research for business, governmental or academic purposes, data collection allows you to gain first-hand knowledge and original insights into your research problem.
PDF A (VERY) BRIEF REFRESHER ON THE CASE STUDY METHOD
the case study method favors the collection of data in natural settings, compared with relying on "derived" data (Bromley, 1986, p. 23)—for example, responses to a researcher's instruments in an experiment or responses to questionnaires in a survey.
Case Study Methods and Examples
The purpose of case study research is twofold: (1) to provide descriptive information and (2) to suggest theoretical relevance. Rich description enables an in-depth or sharpened understanding of the case. It is unique given one characteristic: case studies draw from more than one data source. Case studies are inherently multimodal or mixed ...
Collecting data through case studies
The article describes the decisions that need to be made in planning case study research and then presents examples of how case studies can be used in several performance technology applications. The advantages and disadvantages of case studies as a data collection method are discussed and guidelines for their use are given. References ( ). . ...
Collecting data through case studies
The article describes the decisions that need to be made in planning case study research and then presents examples of how case studies can be used in several performance technology applications. The advantages and disadvantages of case studies as a data collection method are discussed and guidelines for their use are given. Volume 46, Issue 7.
Data Collection Methods and Tools for Research; A Step-by-Step Guide to
One of the main stages in a research study is data collection that enables the researcher to find answers to research questions. Data collection is the process of collecting data aiming to gain insights ... It means the findings of case studies can be used just for the same issues as the general patterns for different studies. Qualitative ...
Design: Selection of Data Collection Methods
In this Rip Out we focus on data collection, but in qualitative research, the entire project must be considered. 1, 2 Careful design of the data collection phase requires the following: deciding who will do what, where, when, and how at the different stages of the research process; acknowledging the role of the researcher as an instrument of ...
Qualitative Research: Data Collection, Analysis, and Management
THE PARTICIPANT'S VIEWPOINT. What qualitative study seeks to convey is why people have thoughts and feelings that might affect the way they behave. Such study may occur in any number of contexts, but here, we focus on pharmacy practice and the way people behave with regard to medicines use (e.g., to understand patients' reasons for nonadherence with medication therapy or to explore ...
Toward Developing a Framework for Conducting Case Study Research
Be aware that the data collection in case study research can be both quantitative and qualitative. The results of our analysis appear in Table 1. Data Analysis. Data analysis consists of examining, categorizing, tabulating, or otherwise recombining the evidence to address the initial propositions of a study. The analysis of the case study is ...
Case Study Research Method in Psychology
The case study is not a research method, but researchers select methods of data collection and analysis that will generate material suitable for case studies. Freud (1909a, 1909b) conducted very detailed investigations into the private lives of his patients in an attempt to both understand and help them overcome their illnesses.
Data Collection Methods
Step 2: Choose your data collection method. Based on the data you want to collect, decide which method is best suited for your research. Experimental research is primarily a quantitative method. Interviews, focus groups, and ethnographies are qualitative methods. Surveys, observations, archival research, and secondary data collection can be ...
A systematic qualitative case study: questions, data collection, NVivo
The study aims to explore the case study method with the formation of questions, data collection procedures and analysis, followed by how and on which position the saturation is achieved in developing a centralized Shariah governance framework for Islamic banks in Bangladesh.,Using purposive and snowball sampling procedures, data have been ...
Data Collection
Case studies involve in-depth analysis of a single individual, organization, or event. Case studies are used to gain detailed information about a specific phenomenon. ... Research: When conducting research, data collection is used to gather information on variables of interest to answer research questions and test hypotheses. Evaluation: ...
Choosing Data Collection Methods: A Guide for Your Study
Understanding your research goals is the starting point in selecting a data collection method. You need to be clear about what you want to achieve with your study.
Developing a Data Management Plan
"A Data Management Plan (DMP) describing the scientific data expected to be created or gathered in the course of a research project must be submitted to DTIC at the start of each research effort. It is important that DoD researchers document plans for preserving data at the outset, keeping in mind the potential utility of the data for future ...
A dataset for measuring the impact of research data and their ...
Science funders, publishers, and data archives make decisions about how to responsibly allocate resources to maximize the reuse potential of research data. This paper introduces a dataset ...
Virtual Coaching, Self-Directed Learning, and the Implementation of
The purpose of this single instrumental case study was to understand how a virtual coaching program provides opportunities for self-directed learning during the implementation of evidence-based practices for adults at Navigator Coaching. The theory guiding this study was Deci and Ryan's self-determination theory as conceptualizations of self-directed learning described in the literature ...
Best Practices in Data Collection and Preparation: Recommendations for
We offer best-practice recommendations for journal reviewers, editors, and authors regarding data collection and preparation. Our recommendations are applicable to research adopting different epistemological and ontological perspectives—including both quantitative and qualitative approaches—as well as research addressing micro (i.e., individuals, teams) and macro (i.e., organizations ...
What is Big Data Analytics?
The main difference between big data analytics and traditional data analytics is the type of data handled and the tools used to analyze it. Traditional analytics deals with structured data, typically stored in relational databases.This type of database helps ensure that data is well-organized and easy for a computer to understand.
AI strategy in business: A guide for executives
Yuval Atsmon: When people talk about artificial intelligence, they include everything to do with analytics, automation, and data analysis. Marvin Minsky, the pioneer of artificial intelligence research in the 1960s, talked about AI as a "suitcase word"—a term into which you can stuff whatever you want—and that still seems to be the case.
Land
Community green spaces (CGSs) constitute a crucial element of urban land use, playing a pivotal role in maintaining the stability of urban ecosystems and enhancing the overall quality of the urban environment. Through the post-occupancy evaluation (POE) of green spaces, we can gain insights into residents' actual needs and usage habits, providing scientific evidence for the planning, design ...
Case Study Methodology of Qualitative Research: Key Attributes and
1. Case study is a research strategy, and not just a method/technique/process of data collection. 2. A case study involves a detailed study of the concerned unit of analysis within its natural setting. A de-contextualised study has no relevance in a case study research. 3. Since an in-depth study is conducted, a case study research allows the ...