Empirical Methods in Political Science: An Introduction

By Justin Zimmerman

9.1 Introduction

The field of political science has traditionally focused on the importance of hypothesis testing, causal inference, experiments and the use of large n data. Quantitative methods in all its capacities is without a doubt important, but what can be lost at times is the value of small n methods of inquiry within the field of political science. Researchers such as Kathy Kramer, Cathy Cohen, Reuel Rogers, and Jennifer Hochschild et. al. have all used small n methods to tell stories about particular groups that have rarely been highlighted in political science. Whether its identifying rural consciousness in Wisconsin ( Kramer 2016 ) , researching the secondary marginalization of the most disfranchised in the black community ( Cohen 1999 ) , explaining the unique political stances of Afro-Caribbean immigrants ( Rogers 2006 ) , or highlighting the politics of a new racial order ( Hochschild, Weaver, and Burch 2012 ) , small n data can allow for a researcher to discover new information not easily attainable through quantitative methods alone. Small n methods allow for a more in depth assessment of a particular area and people.

This chapter will focus on the importance small n research. The chapter will highlight the various methods for conducting small n research including: interviews, participant observation, focus groups, and process tracing, as well as the various procedures for determining case selection. First, the chapter will elaborate the differences and goals of small n research as compare to quantitative research.

9.2 Background

To be a well-rounded political scientist it is important to understand that not every question can be answered through quantitative methods alone. There are times when small n methods are the more appropriate option. Yet, how does a researcher decide when small n methods are appropriate for their research? The researcher must be able to identify the differences and purposes of small n qualitative research and quantitative research. First, quantitative research focuses on the effects of causes, while qualitative methods is focused on the causes of effects. In other words, quantitative research, especially with regards to causal inference, aims to figure out if a particular treatment causes a particular outcome, such as an increase in an individual’s education causes them to be more political mobilized.

Small n qualitative research on the other hand focuses on understanding how the outcome came to be. American Political Development (APD) scholars are a great reference to this line of thinking. APD scholars look to track why certain outcomes came to be, such as Paul Frymer’s work on Western expansion in the United States of America ( Frymer 2017 ) or Chloe Thurston’s research on housing policy and how it has historically discriminated against women, African Americans, and the poor through the use of public-private partnerships ( Thurston 2018 ) . Small n qualitative research also includes oral histories such as those provided by Yolande Bouka concerning the Rwandan genocide ( Bouka 2013 ) and the interviews and historical context to explain the coercive power of policing in Latin America as researched by Yanilda María González ( González 2017 ) . In short, small n qualitative research aims to tell a story of how an event or policy came to be, and what are the experiences of particular groups because of a particular event or policy.

Thus, a small n qualitative researcher must take care to ensure their work is able to satisfy three characteristics of good qualitative research. First, their research must emphasize the cause and the implications it has. Second, good small n qualitative theories must explain the outcome in all the cases within the population. Lastly, qualitative questions must answer whether events were necessary or sufficient for an outcome to occur, with the cause providing the explanation. To setup qualitative research it is important to that understand that qualitative methods are interested more in the mechanisms behind things. Small n approaches can help us explore the underlying process such as how institutions evolve and change by gathering data about institutions, but it can also be answered through looking at institutional change in one or two contexts. Small n qualitative research can be inductive as a researcher builds the theory and hypotheses from the data, or deductive by testing theories and hypotheses with the data. What is critical in building qualitative research whether inductively or deductively is case selection.

9.3 Case Selection

Case selection for small n qualitative research setup to use a small number of cases in order to go into a deep dive into a specific subject. For instance, a researcher may use a specific neighborhood to explain a specific political characteristic of the community. Reuel Rogers conducts this exact research when he interviewed Afro-Caribbean residents in New York City about their political preferences as new immigrants of the United States of America (2006). This case selection allowed for Rogers to assess the veracity of an age old claim that pluralism allows for immigrants to eventually assimilate into American culture and government participation by highlighting the complexity that comes from immigrants that are identified as black. Rogers finds that Afro-Caribbean immigrants suffer from discrimination that may hinder their ability to assimilate into American society. Yet, how does a researcher decide what cases to use? Seawright and Gerring provide some insight by identifying seven case selection procedures ( Seawright and Gerring 2008 ) . For the purposes of this text, this chapter will focus on four of these case selection procedures. The cases focused on will be most similar, most different, typical, and deviant. The chapter will also briefly describe extreme, diverse, and influential cases.

9.3.1 Most Similar

Seawright and Gerring instruct the use of the most similar case selection must have at least two cases to compare. Ideally, when using most similar cases all independent variables other than the key independent variable or dependent variable would be similar. For example, we may compare neighborhood with similar variables for income, religion, and education with the key independent variable such as race being the only difference. Thus, a researcher could use small n case selection to research differences or similarities that black middle class residents of particular neighborhood have with a white middle class neighborhood. It should be noted that matching any particular cases by exact characteristics is essentially impossible in the social science. Thus, this technique is daunting to say the least. Yet, part of the compromise of political science and social science in general is doing the best with the information you have and being honest about the limitations. This is especially important in the use of the most similar case selection procedure.

9.3.2 Most Different

Gerring and Seawright also identify the use of the most different case selection procedure. The most different case refers to cases that are different on specified variables other than the key independent variable and dependent variable. For instance, maybe there are class, education, and religion differences between two neighborhoods, but the key independent variable of race remains the same for both. Gerring and Seawright argue that this tends to be the weaker route to take in comparing two case but nonetheless it is an option to use for a small n researcher under the right circumstances.

9.3.3 Typical Case

The typical case refers to common or representative case that a theory explains. According to Gerring and Seawright, the typical case should be well defined by an existing model which allows for the researcher to observe problems within the case rather than relying on any particular comparison. A typical case is great for confirming or disconfirming particular theories. Referring back to the work of Reuel Rogers and his work on black Caribbean immigrants in New York City, Rogers was able to disconfirm Dahl’s argument on plurality allowing for the eventual full inclusion of immigrants by pointing to the racism and discrimination black Caribbean immigrants face that hinders their ability to be fully incorporated into the American polity. What is most important for understanding the typical case is that it is representative and that this representation must be placed somewhere within existing models and theories to be useful.

9.3.4 Deviant Case

Conversely to the typical procedure, the deviant case cannot be explained by theory. A researcher can have one or more deviant cases and these cases serve more as a function of exploration and confirming variation within cases. The deviant case is essentially checking for anomalies within an established theory and allows for the finding of previously unidentified explanations in particular cases. An example may be finding that liberalism is defined differently depending on certain populations which runs counter to Haartz’ assertion that liberalism assumes a certain amount of unity throughout the country. What is most important for understanding the deviant case is for a researcher to check for representativeness of a theory, which allows for much of the value of small n methods. A researcher can tell a story of a particular group that is often assumed to fit the general understandings of political science but through the use of qualitative methods is shown to be more complex than previously understood.

9.3.5 Other Selection Approaches

Along with the four main case selection procedures are other are three other approaches worth noting. The first being the extreme case . The extreme case is characterized by cases that are very high or very low on a researchers’ key independent or dependent variables. It can provide the means to better understand and explore phenomena through the means of maximizing variation on the dimensions of interest in the selection of very low and high cases (Seawright and Gerring, 2008). Unlike in linear regression, where extreme values can provide an incomplete or inaccurate picture, in small n approaches, extreme cases can offer the opportunity for deepening the understanding of a phenomenon by focusing on its most extreme instances. (Collier, Mahoney and Seawright 2004; 4-5)

Second, diverse cases highlight range of possible values. A researcher can choose low/medium/high for their independent variable to illustrate the range of possibility. Two or more cases are needed and this procedure mainly serves as a method for developing new hypotheses. These cases are minimally representative of the entire population

Lastly, influential cases are outliers in a sense that they are not typical and may be playing an outsize role in a researcher’s results. It is unlikely that small n methods will play a significant role as influential cases rely on large n methods.

Check-in Question 1: How should a researcher go about choosing a case selection procedure?

9.4 Method: setup/overview

Small n methods are characterized by an emphasis on detail. A researcher has to be able to see the environment that they are studying. The purpose of small n methods is to gain an in depth knowledge of particular cases. Field notes will be a researcher’s best friend. A researcher should take notes on the demographics, noises, emotions, mores, and much more to gain an accurate understanding of the population they are studying. Additionally, small n methods are about building rapport with the population being studied and constantly taking into account one’s own biases and thoughts as they conduct fieldwork. It is not uncommon for researchers to eventually live in the places they are studying. During her work on the black middle class, Mary Pattillo would eventually move into the South Side Chicago neighborhood of Groveland. The neighborhood was the subject of her book Black Picket Fences ( Pattillo 2013 ) . Pattillo would attend community meetings, shop, and cultivate lasting relationships with the community, which would guide her research. There is a level of intimacy needed to do good small n research. Not always to the extent of needing to live with one’s participants, but still a need for insight that goes beyond a shallow understanding of a particular community. Small n qualitative researcher gets at these insights through several methods.

Note: Take sometime to think about for your own research what you are noticing during your fieldwork? How is this informing your study?

9.5 Method: types

The typical methods used in small n research are interviews, participant observation, focus groups, process tracing, and ethnography. Each method has its advantages and disadvantages and a researcher can utilize more than one these methods depending on the aims of their research. In deciding on a small n method a researcher must consider the goals of the research, validity, and conceptual framework that will feed the researcher’s broader question. The diagram below illustrates that a small n qualitative researcher should be purposeful in their research design. They must consider their overall question. Specify the goals of their research, consider the theories that are driving the conceptual framework of their research, and consider the validity (does it make sense) of their research design.

Research Methods Diagram

Figure 9.1: Research Methods Diagram

Focusing on the methods portion of the diagram, this chapter will discuss in further detail each small n qualitative method.

9.5.1 Interviews

Conducting interviews can seem like a daunting experience. A researcher has to develop a comfort in approaching diverse sets of people, many times in unfamiliar environments. A researcher has to be able to build rapport, get their questions answered within a limited amount of time and encourage the participant to elaborate and clarify answers. Interviews are challenging but the good news is there are ways to make the process smoother through organization, commitment, and earnestness.

Before contacting anyone for an interview, a researcher should take sometime to organize their interview guide and decide whether they want to conduct structured or semi-structured interviews. The interview guide highlights the questions and themes the researcher plans to cover during the interview. The format of the interview guide is determined by whether the researcher has a rigid structure of questions they plan to ask each participant (Structured Interview) or a more flexible interview strategy that allows for the researcher to deviate from questions and allow for a more exploratory conversation within the confines of the research question (Semi-Structured Interview).

Once a researcher has decided on an interview structure and completed their interview guide, they can decide who they want to recruit to participate in the interview. The researcher will need to consider the key informants and representative sample they want to recruit. Key informants are experts that can discuss the population of interest including but not limited to academics, community leaders, and politicians. The representative sample is the population that your research is based on. For example, Wendy Pearlman’s text We Crossed a Bridge and it Trembled: Voices from Syria has a representative sample of Syrians displaced during the civil war ( Pearlman 2017 ) . What is important to understand about the difference between the representative sample and key informants is that the sample is giving a firsthand account of their experiences, while a key informant is mainly given their observation and experiences of the representative sample from an outside perspective.

Moving on to recruitment, Robert Weiss’ Learning From Strangers lists several reasons that affect whether an individual is willing to participate in an interview including: occupation, region, retirement status, vulnerability, and sponsorship from others within their network ( Weiss 1994 ) . Unfortunately, there is no easy way to recruit but from experience face to face discussions with potential participants and immediate follow up are quite effective. Also use snowball sampling to use previous participants acquaintances and networks to participate in interviews. These strategies are not full proof but a layer of personal interaction through face to face contact or networks does have advantages in making many people more receptive to participating in interviews.

Lastly, when the day to interview finally arrives a researcher should have two recorders, tissue, interview guide, consent form, and a gift card for the participant if possible. The interview should not take any longer than an hour as a sign of respect for the time of your participant. A researcher should take meticulous notes during the interview. Also, the researcher must gain the permission of the participant to conduct a follow up interview if necessary.

Check-in Question 2: What is this difference between a representative sample and a key informant?

9.5.2 Participant Observation

Participant observation is a variation of ethnographic research where the researcher participates in an organization, community, or other group-oriented activities as a member of the community. Typically used in anthropology, it involves a researcher immersing themselves within a community. Participant observation requires that the research build a strong bond of trust with the observed community. A researcher (with the help of IRB) will need to decide if participation will be active or passive and whether it should be overt or covert. This can be a particularly sticky situation, as a passive and covert observation may mean community members have no idea they are being studied, while active and overt participation can lead to the environment changing as the community is aware of the presence and role of the researcher. Referring back to the work of Mary Pattillo, recall that she eventually became a citizen of Groveland and participated as any other citizen in community activities ( Pattillo 2013 ) . This included leading the local church choir, joining the community’s local action group, and coaching cheerleading at the local park. Pattillo saw her participant observations as essential to describing the black middle class in Groveland and even speaks of the parallels between the Groveland neighborhood and her upbringing in Milwaukee.

The key purpose of participant observations is to provide deeper insight into process and how things function. This exercise is good for ‘theory building,’ but it may be best to include another method, such as interviewing, to allow for the community to tell their story as well, a supplemental method Pattillo uses as well. What is most important when using participant observation (in qualitative methods in general) is to take meticulous field notes with attention to accuracy. A researcher should be cognizant of their own biases and constantly thinking through their analysis to make sure they a capturing an accurate story. In order to tell an accurate story a researcher should keep both mental notes and a notepad. After the end of an event it is important to write everything down while the researcher’s memory is fresh.

Check-in Question 3: What are the advantages and disadvantage of covert and over participant observation?

9.5.3 Focus Groups

Focus Groups, similar to individual interviews requires a researchers to set questions, recruit participants and follow up with participants as necessary. As with an individual interview, the researcher should have an interview guide to help structure the questions and themes of the focus group. The advantage of a focus group is that a researcher is able to facilitate multiple respondents at once, which can lead to additional details and information you might not get in series of single interviews. As seen in Melissa Harris Perry’s Sister Citizen , focus groups are great for spurring discussion about topics such as stereotypes ( Harris-Perry 2011 ) . A researcher should note impressions, points of contention, and general interactions within the group. Group dynamics and discussions can be used for theory building as well as getting a deeper understanding of a particular group of people.

9.5.4 Process Tracing

Process tracing is a method of causal inference using descriptive inference over time. Notably used by APD scholars, the goal of process tracing are to collect evidence to evaluate a set of hypotheses through the framing of historical events. There are four tests when discussing process tracing.

The first is the straw in the wind test . The straw in the wind test can increase plausibility but cannot determine that any event necessary nor sufficient criterion for rejecting. It can only weaken hypotheses. The hoop test establishes necessary criterion. Though the hoop test does not confirm any particular hypotheses, the test can eliminate hypotheses. The smoking gun test provides a sufficient but not necessary criterion for hypotheses. The test can give strong support for a given hypothesis and can substantially weaken competing hypotheses. Lastly, the doubly decisive test illustrates evidence that is necessary sufficient. Necessary being when the necessary causes occur when the effect occur and sufficient being when causes always occur after effects.

What is important to understand about process tracing beyond the numerous tests is that process tracing is a good way in political science to draw evidence for certain events and phenomena. Chloe Thurston uses process tracing to track the development of the public-private partnership with regards to housing policy ( Thurston 2018 ) . Through numerous historical text including archives, testimonial, and presidential records, Thurston is able to develop a story of how public-private partnerships led to home owning policies that discriminated according to gender, race, and socioeconomic status and how advocacy groups were able to combat these policies.

Thus, process tracing looks for historical evidence to explain certain events or policies.

9.5.5 Ethnography

Ethnography involves studying groups of people and their experiences ( Emerson, Fretz, and Shaw 2011 ) . As mentioned earlier with participant observations, the purpose of ethnography is for a researcher to immerse themselves in the environment they are studying. The researcher will need to develop relationships with the community and detail the environment through constant note taking and reflection. This is reflected in the work of many of the researchers already detailed in the chapter. Done correctly a researcher can document the emotions, attitudes, and relationships in a community that are sometimes impossible to capture in quantitative work.

In his text Wounded City: Violent Turf Wars in a Chicago Barrio , Robert Vargas is able to capture the fear, frustration, and empowerment felt by the residents of Chicago’s Little Village as they negotiate turf wars between gangs, police, and alderman [vargas2016a]. The insight he is able to gather cannot simply be surveyed, but must be observed in the environment in order to develop trust within the community.

Ethnography is about relationship building and allows for latent findings that may give proper context for understanding particular groups. This is especially important for underrepresented communities, where in depth research is often lacking and responsiveness to a survey may not be likely under less personal circumstances. Ethnography allows a researcher to take a more holistic approach in understanding a community.

Check-in Question 4: What should a researcher be looking for when taking ethnographic field notes?

9.6 Applications

The application of small n qualitative methods is based on a researcher’s question. Sociologist, Celeste Watkins-Hayes, explains that qualitative research is meant to tell specific stories about a community. Going back to the diagram displayed in the beginning of the chapter, a researcher should think of the story they are trying to tell and goals, whether the small n qualitative methods they want to use are valid, and how does all of this relate to the research question. Most importantly when applying small n qualitative methods, record keeping is of the utmost importance. A researcher should make sure that their field notes are detailed and capture an accurate depiction of the environment of study. This means not only self-reflecting on one’s own biases, but also using multiple small n and quantitative methods when appropriate to tell the most complete story possible. Lastly, a researcher needs a method of coding the themes and messages found through their study. Recording encounters and taking good field notes will go far in creating an organized system, which will allow for a researcher to tell an accurate story that captures the nuances and characteristics of a particular community.

9.7 Advantages of Method

Small n qualitative research thrives with gaining in depth information about a limited number of cases. This will allow a researcher to provide insight of a small number of communities that may be missing from large n studies. In this same breath, small n methods allow for theory building that many times is unique to many of the lessons taken for granted in the discipline of political science. It is one thing to ask an individual participant to check an answer on a question about immigration, race, or president. Yet, there is value is going deeper and wrestling with the values, contradictions, as well as the historic and present-day context that make up the politics of a particular people. It is through small n methods that researchers are able to get a better understanding of topics such as rural consciousness, neighborhood violence, and linked-fate. Small n methods allow a researcher to tell the stories that are often ignored, unheard, or misinterpreted through other methods.

9.8 Disadvantages of Method

The major disadvantage of small n methods is that a researcher is working from a small pool. This should not be confused with having less data. Interviews, field notes, and archives bring an abundance of data but the sources are limited. A responsible researcher will have to consider whether their case selection is representative of the broader community and how best to ensure that they are getting a diverse set of voices to hear from to avoid inaccurate assessments of a community. Thus, it is difficult (but not impossible) to generalize from the use of small n research. A researcher including quantitative methods or multiple small n methods in their study will go a long way in strengthening their arguments.

9.9 Broader significance/use in political science

As has been noted numerous times in the chapter, small n qualitative methods allow a researcher to explore groups that cannot necessarily be understood merely with a survey, experiment, or causal inference. Small n allows for a researcher to go into more detail about groups that cannot be fully understood through quantitative research either because they are too small or too unresponsive to quantitative methods. Additionally, small n qualitative research also allows for political scientist to consider context and history when developing claims regarding the political behaviors and institutions that shape society. This context can help a political scientist go beyond superficial understandings of particular groups. For instance, Michael Dawson’s text Black Visions uses quantitative methods to show that African Americans have a high support for Black Nationalism ( Dawson 2001 ) . This finding alone could be taken as example of mass black prejudice, as Black Nationalism has been associated most notably with the bigoted views of Louis Farrakhan. Yet, Dawson takes care to include the historical context, including testimonials by leading black thinkers, detailing the long history of debate concerning Black Nationalism, as well as the economic violence and discrimination committed against the black community, which leads to support of some forms of Black Nationalism. Small n qualitative research through the use of history, interviews, and ethnography allows for the telling of these stories, adding complexity and nuance to many of political science’s well established theories and perceptions.

9.10 Conclusion

Not all questions can be answered with a survey and experiment alone. Sometimes a deeper study into a community and event can lead to new and exciting insights in the discipline of political science. Admittedly, small n qualitative research can be met with some cynicism in certain parts of the political science community, but when done correctly through meticulous note taking, coding, and preparation small n qualitative methods can provide insights that have yet to be fully articulated in the discipline and assist in answering some of the most important questions of the day including policing, immigration, and race relations.

9.11 Application Questions

Application Question 1

What are some materials needed to conduct small n research?

Application Question 2

When in the field, how does a researcher build rapport with the community?

9.12 Key Terms

NOTE: this is an incomplete list and it needs expanded!!!!!

covert observation

deviant case

diverse cases

ethnography

extreme case

focus groups

influential cases

key informant

most different

most similar

overt observation

participant observation

process tracing

representative Sample

snowball sampling

smoking gun test

straw wind test

typical case

9.13 Answers to Application Questions

A researcher should have their interview guide prepared, tissues, and two recorders if conducting interviews or focus groups. Additionally, a researcher should have a notepad for field notes and consent forms if necessary. Business cards are also useful when trying to recruit participants from the field.

Rapport can be built through appearance including dress, race, gender, regional, and class markers. Most importantly, a researcher should present themselves as engaged and attentive to the participants. A researcher should remain professional and read the room, rapport building for a group of blue collar workers may be different than with college students. A researcher should remain cognizant of this distinction and look for openings to build connections when possible.

Logo for BCcampus Open Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 10: Single-Subject Research

Overview of Single-Subject Research

Learning Objectives

  • Explain what single-subject research is, including how it differs from other types of psychological research.
  • Explain what case studies are, including some of their strengths and weaknesses.
  • Explain who uses single-subject research and why.

What Is Single-Subject Research?

Single-subject research  is a type of quantitative research that involves studying in detail the behaviour of each of a small number of participants. Note that the term  single-subject  does not mean that only one participant is studied; it is more typical for there to be somewhere between two and 10 participants. (This is why single-subject research designs are sometimes called small- n designs, where  n  is the statistical symbol for the sample size.) Single-subject research can be contrasted with  group research , which typically involves studying large numbers of participants and examining their behaviour primarily in terms of group means, standard deviations, and so on. The majority of this textbook is devoted to understanding group research, which is the most common approach in psychology. But single-subject research is an important alternative, and it is the primary approach in some areas of psychology.

Before continuing, it is important to distinguish single-subject research from two other approaches, both of which involve studying in detail a small number of participants. One is qualitative research, which focuses on understanding people’s subjective experience by collecting relatively unstructured data (e.g., detailed interviews) and analyzing those data using narrative rather than quantitative techniques. Single-subject research, in contrast, focuses on understanding objective behaviour through experimental manipulation and control, collecting highly structured data, and analyzing those data quantitatively.

It is also important to distinguish single-subject research from case studies. A case study  is a detailed description of an individual, which can include both qualitative and quantitative analyses. (Case studies that include only qualitative analyses can be considered a type of qualitative research.) The history of psychology is filled with influential cases studies, such as Sigmund Freud’s description of “Anna O.” (see Note 10.5 “The Case of “Anna O.””) and John Watson and Rosalie Rayner’s description of Little Albert (Watson & Rayner, 1920) [1] , who learned to fear a white rat—along with other furry objects—when the researchers made a loud noise while he was playing with the rat. Case studies can be useful for suggesting new research questions and for illustrating general principles. They can also help researchers understand rare phenomena, such as the effects of damage to a specific part of the human brain. As a general rule, however, case studies cannot substitute for carefully designed group or single-subject research studies. One reason is that case studies usually do not allow researchers to determine whether specific events are causally related, or even related at all. For example, if a patient is described in a case study as having been sexually abused as a child and then as having developed an eating disorder as a teenager, there is no way to determine whether these two events had anything to do with each other. A second reason is that an individual case can always be unusual in some way and therefore be unrepresentative of people more generally. Thus case studies have serious problems with both internal and external validity.

The Case of “Anna O.”

Sigmund Freud used the case of a young woman he called “Anna O.” to illustrate many principles of his theory of psychoanalysis (Freud, 1961) [2] . (Her real name was Bertha Pappenheim, and she was an early feminist who went on to make important contributions to the field of social work.) Anna had come to Freud’s colleague Josef Breuer around 1880 with a variety of odd physical and psychological symptoms. One of them was that for several weeks she was unable to drink any fluids. According to Freud,

She would take up the glass of water that she longed for, but as soon as it touched her lips she would push it away like someone suffering from hydrophobia.…She lived only on fruit, such as melons, etc., so as to lessen her tormenting thirst. (p. 9)

But according to Freud, a breakthrough came one day while Anna was under hypnosis.

[S]he grumbled about her English “lady-companion,” whom she did not care for, and went on to describe, with every sign of disgust, how she had once gone into this lady’s room and how her little dog—horrid creature!—had drunk out of a glass there. The patient had said nothing, as she had wanted to be polite. After giving further energetic expression to the anger she had held back, she asked for something to drink, drank a large quantity of water without any difficulty, and awoke from her hypnosis with the glass at her lips; and thereupon the disturbance vanished, never to return. (p.9)

Freud’s interpretation was that Anna had repressed the memory of this incident along with the emotion that it triggered and that this was what had caused her inability to drink. Furthermore, her recollection of the incident, along with her expression of the emotion she had repressed, caused the symptom to go away.

As an illustration of Freud’s theory, the case study of Anna O. is quite effective. As evidence for the theory, however, it is essentially worthless. The description provides no way of knowing whether Anna had really repressed the memory of the dog drinking from the glass, whether this repression had caused her inability to drink, or whether recalling this “trauma” relieved the symptom. It is also unclear from this case study how typical or atypical Anna’s experience was.

A woman in a floor-length dress with long sleeves. She holds a long white stick.

Assumptions of Single-Subject Research

Again, single-subject research involves studying a small number of participants and focusing intensively on the behaviour of each one. But why take this approach instead of the group approach? There are several important assumptions underlying single-subject research, and it will help to consider them now.

First and foremost is the assumption that it is important to focus intensively on the behaviour of individual participants. One reason for this is that group research can hide individual differences and generate results that do not represent the behaviour of any individual. For example, a treatment that has a positive effect for half the people exposed to it but a negative effect for the other half would, on average, appear to have no effect at all. Single-subject research, however, would likely reveal these individual differences. A second reason to focus intensively on individuals is that sometimes it is the behaviour of a particular individual that is primarily of interest. A school psychologist, for example, might be interested in changing the behaviour of a particular disruptive student. Although previous published research (both single-subject and group research) is likely to provide some guidance on how to do this, conducting a study on this student would be more direct and probably more effective.

A second assumption of single-subject research is that it is important to discover causal relationships through the manipulation of an independent variable, the careful measurement of a dependent variable, and the control of extraneous variables. For this reason, single-subject research is often considered a type of experimental research with good internal validity. Recall, for example, that Hall and his colleagues measured their dependent variable (studying) many times—first under a no-treatment control condition, then under a treatment condition (positive teacher attention), and then again under the control condition. Because there was a clear increase in studying when the treatment was introduced, a decrease when it was removed, and an increase when it was reintroduced, there is little doubt that the treatment was the cause of the improvement.

A third assumption of single-subject research is that it is important to study strong and consistent effects that have biological or social importance. Applied researchers, in particular, are interested in treatments that have substantial effects on important behaviours and that can be implemented reliably in the real-world contexts in which they occur. This is sometimes referred to as social validity  (Wolf, 1976) [3] . The study by Hall and his colleagues, for example, had good social validity because it showed strong and consistent effects of positive teacher attention on a behaviour that is of obvious importance to teachers, parents, and students. Furthermore, the teachers found the treatment easy to implement, even in their often-chaotic elementary school classrooms.

Who Uses Single-Subject Research?

Single-subject research has been around as long as the field of psychology itself. In the late 1800s, one of psychology’s founders, Wilhelm Wundt, studied sensation and consciousness by focusing intensively on each of a small number of research participants. Herman Ebbinghaus’s research on memory and Ivan Pavlov’s research on classical conditioning are other early examples, both of which are still described in almost every introductory psychology textbook.

In the middle of the 20th century, B. F. Skinner clarified many of the assumptions underlying single-subject research and refined many of its techniques (Skinner, 1938) [4] . He and other researchers then used it to describe how rewards, punishments, and other external factors affect behaviour over time. This work was carried out primarily using nonhuman subjects—mostly rats and pigeons. This approach, which Skinner called the experimental analysis of behaviour —remains an important subfield of psychology and continues to rely almost exclusively on single-subject research. For excellent examples of this work, look at any issue of the  Journal of the Experimental Analysis of Behaviour . By the 1960s, many researchers were interested in using this approach to conduct applied research primarily with humans—a subfield now called  applied behaviour analysis  (Baer, Wolf, & Risley, 1968) [5] . Applied behaviour analysis plays an especially important role in contemporary research on developmental disabilities, education, organizational behaviour, and health, among many other areas. Excellent examples of this work (including the study by Hall and his colleagues) can be found in the  Journal of Applied Behaviour Analysis .

Although most contemporary single-subject research is conducted from the behavioural perspective, it can in principle be used to address questions framed in terms of any theoretical perspective. For example, a studying technique based on cognitive principles of learning and memory could be evaluated by testing it on individual high school students using the single-subject approach. The single-subject approach can also be used by clinicians who take any theoretical perspective—behavioural, cognitive, psychodynamic, or humanistic—to study processes of therapeutic change with individual clients and to document their clients’ improvement (Kazdin, 1982) [6] .

Key Takeaways

  • Single-subject research—which involves testing a small number of participants and focusing intensively on the behaviour of each individual—is an important alternative to group research in psychology.
  • Single-subject studies must be distinguished from case studies, in which an individual case is described in detail. Case studies can be useful for generating new research questions, for studying rare phenomena, and for illustrating general principles. However, they cannot substitute for carefully controlled experimental or correlational studies because they are low in internal and external validity.
  • Single-subject research has been around since the beginning of the field of psychology. Today it is most strongly associated with the behavioural theoretical perspective, but it can in principle be used to study behaviour from any perspective.
  • Practice: Find and read a published article in psychology that reports new single-subject research. ( An archive of articles published in the Journal of Applied Behaviour Analysis can be found at http://www.ncbi.nlm.nih.gov/pmc/journals/309/) Write a short summary of the study.
  • Describe one problem related to internal validity.
  • Describe one problem related to external validity.
  • Generate one hypothesis suggested by the case study that might be interesting to test in a systematic single-subject or group study.

Media Attributions

  • Pappenheim 1882 by unknown is in the Public Domain .
  • Watson, J. B., & Rayner, R. (1920). Conditioned emotional reactions.  Journal of Experimental Psychology, 3 , 1–14. ↵
  • Freud, S. (1961).  Five lectures on psycho-analysis . New York, NY: Norton. ↵
  • Wolf, M. (1976). Social validity: The case for subjective measurement or how applied behaviour analysis is finding its heart.  Journal of Applied Behaviour Analysis, 11 , 203–214. ↵
  • Skinner, B. F. (1938). T he behaviour of organisms: An experimental analysis . New York, NY: Appleton-Century-Crofts. ↵
  • Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). Some current dimensions of applied behaviour analysis.  Journal of Applied Behaviour Analysis, 1 , 91–97. ↵
  • Kazdin, A. E. (1982).  Single-case research designs: Methods for clinical and applied settings . New York, NY: Oxford University Press. ↵

A type of quantitative research that involves studying the behaviour of each small number of participants in detail.

The study of large numbers of participants and examining their behaviour primarily in terms of group means, standard deviations, and so on.

A detailed description of an individual, which can include both qualitative and quantitative analyses.

The study of strong and consistent effects that can be implemented reliably in the real-world contexts in which they occur.

Laboratory methods that rely on single-subject research; based upon B. F. Skinner’s philosophy of behaviourism which posits that everything organisms do is behaviour.

Starting in the 1960s, researchers began using single-subject techniques to conduct applied research with human subjects.

Research Methods in Psychology - 2nd Canadian Edition Copyright © 2015 by Paul C. Price, Rajiv Jhangiani, & I-Chant A. Chiang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

small n case study example

  • Research article
  • Open access
  • Published: 22 November 2022

Remarkably reproducible psychological (memory) phenomena in the classroom: some evidence for generality from small- N research

  • Abdulrazaq A. Imam   ORCID: orcid.org/0000-0002-1262-6022 1  

BMC Psychology volume  10 , Article number:  274 ( 2022 ) Cite this article

2003 Accesses

Metrics details

Mainstream psychology is experiencing a crisis of confidence. Many of the methodological solutions offered in response have focused largely on statistical alternatives to null hypothesis statistical testing, ignoring nonstatistical remedies that are readily available within psychology; namely, use of small- N designs. In fact, many classic memory studies that have passed the test of replicability used them. That methodological legacy warranted a retrospective look at nonexperimental data to explore the generality of the reported effects.

Various classroom demonstrations were conducted over multiple semesters in introductory psychology courses with typical, mostly freshman students from a predominantly white private Catholic university in the US Midwest based on classic memory experiments on immediate memory span, chunking, and depth of processing.

Students tended to remember 7 ± 2 digits, remembered more digits of π following an attached meaningful story, and remembered more words after elaborative rehearsal than after maintenance rehearsal. These results amount to replications under uncontrolled classroom environments of the classic experiments originally conducted largely outside of null hypothesis statistical testing frameworks.

Conclusions

In light of the ongoing replication crisis in psychology, the results are remarkable and noteworthy, validating these historically important psychological findings. They are testament to the reliability of reproducible effects as the hallmark of empirical findings in science and suggest an alternative approach to commonly proffered solutions to the replication crisis.

Peer Review reports

“…a reproducible finding may not necessarily be true; however, a finding that fails reproduction or replication under identical conditions is most likely false. An additional factor operative in social sciences is the subjects’ beliefs and information available to them, which dilutes the concept of objective truth and exacerbates the epistemological divergence between reproducibility and validity of scientific results.” [ 1 ] “It is possible that different psychological science subfields have different priors and different biases, so it would not be surprising if the proportion of unchallenged fallacies varies considerably across subfields (e.g., from 30 to 95%). Then, the remaining 66–1%, respectively, would be unconfirmed genuine discoveries. In all, the overall credibility of psychological science at the moment may be in serious trouble.” [ 2 ]

There is wide acknowledgement of a twin crisis in psychology and beyond (e.g., [ 3 ], see [ 2 ]), namely, widespread questionable research practices (QRPs) and failures to replicate or reproduce important findings in psychology such as in precognition [ 4 , 5 ] and priming [ 6 – 9 ]. It appears the pervasive adoption of inferential statistics in the form of null hypothesis statistical testing (NHST) is a contributing factor (see [ 10 ]) even as the second of these crises manifests to varying degrees across disciplines (e.g., [ 1 , 11 – 14 ], see [ 15 ]). In psychology, the much proclaimed replication failures may have been, in part, a byproduct of the first, in that QRPs naturally flowed out of the almost blanket adoption of NHST as a primary means of analyzing and evaluating data (see [ 1 , 16 ]). Almost blanket because some areas of psychology, particularly behavioral psychology, had a wholly different approach to data analyses and evaluation. According to Smith and Little, there are pockets of use of this approach in cognitive psychology as well (see [ 17 , 18 ]). As such, psychology probably is unique in effectively having more than one research tradition. Notwithstanding, the solutions that have been adopted to deal with these crises have tended to focus only on one of them, almost as if there is just one such tradition in practice. Solutions surrounding the adoption of the “new statistics” [ 19 ] including advocacy for different replication efforts [ 10 , 20 ] have been tailored narrowly to address the ubiquity of NHST and its impacts on psychological research (see [ 21 ]). The two statistical alternatives typically offered up for consideration, namely, the frequentist “new statistics” (e.g., [ 19 , 22 ]) and Bayesian statistics (e.g., [ 23 , 24 ]), actually belong in one tradition within psychology (see [ 25 ]) as elaborated below.

The import of the opening quotations to this section is that, on the one hand, psychology in general, like other social sciences, uniquely deals in human phenomena that necessarily evolve an epistemological gap between replications and validity of its findings. On the other hand, although specific areas of psychology vary in their respective production of false positives, the net result is the credibility crisis that befalls the whole discipline. Distinctions we make on some topics in psychology may be arbitrary and capricious. Such is the case with memory, which is ordinarily considered cognitive at large. This paper argues that the methodologies deployed to study the phenomenon in classical times would not be considered appropriate for its study today largely because it happens to be cognitive in today’s terms. As such, standard mainstream methodologies involving group designs would apply typically. Current subfield differentiations (e.g., between cognitive vs. behavioral), however, blur the historical and epistemological significance of the nexus between replicability and methodology, on the one hand, and between methodology and validity, on the other. Certain aspects of memory have benefited historically from making contacts with different methodologies that afford an evaluation of their validity. The first part of the following review argues that indeed there is yet another largely neglected option in the ongoing remedial efforts that is worthy of serious consideration (see [ 18 , 26 , 27 ]) in addition to those currently on offer for dealing with the aforementioned crises. The final part of the review situates the memory phenomena reported here in the context of the historical reality of a dual research tradition in psychology.

Two research traditions in psychology

Broadly speaking, psychology has two research traditions historically. One that is predominant today involves large- N group designs. In this approach, researchers tend to begin with stated hypotheses tested using appropriate experimental designs informed by specific statistical considerations and assumptions, which may or may not be fulfilled in practice, followed by data analyses and interpretations deployed to answer them. In preponderance of the times, the latter usually involves deploying NHST, which has been the subject of numerous and intensive criticisms for various pitfalls (see [ 28 – 33 ]). Although the goal of such research is to achieve extrapolation from the sample to the population, often the population is not well defined and there is substantial dependence on largely undergraduate convenience samples (see [ 18 , 25 , 34 ]). Use of convenience samples represents a departure from untenable random-sampling assumptions that statistic analyses rely on to justify the conclusions reached about observed effects [ 1 ]. Hanin made the case, for example, that “… (a) arbitrarily small deviations from the random sampling assumption can have arbitrarily large effects on the outcomes of statistical analyses, (b) the commonly occurring observations with random sample size may violate the Law of Large Numbers (LLN, which make them unsuitable for conventional statistical inference…” [ 1 ], p. 2). In these and many other ways, one could fault psychologists for poorly using the best statistical tools (see also [ 35 ], p. 221).

Historically, the NHST approach represents a hybrid of two distinct statistical positions in psychology, namely, Fisher’s statistical significance testing (SST) and Neyman–Pearson’s statistical hypothesis testing (SHT; [ 36 ]). There were fundamental differences between the two, some of which are irreconcilable, but the hybridization occurred nevertheless (see, e.g., [ 37 ]), usually without a hint of the history in statistics or methodology textbooks [ 1 ]. The outcome has been a terribly flawed process of interpretation of psychological research findings [ 30 , 32 , 38 – 40 ]. One major flaw is the false conception of the p value as an index of confidence in the results; another is the seriously mistaken belief by many that it represents replicability of the results [ 33 , 37 ], see also [ 41 ]. Perhaps partly due to the latter erroneous replicability posture on the meaning of the venerated p value in extant psychology, there have been aforementioned failures in replication practices and reproducibility of important psychological findings (e.g., [ 5 , 6 ]) resulting in new efforts at promoting replications (see [ 20 , 42 ]), on one hand. On the other hand, NHST alternatives such as the new statistics recommending the use and reporting of effect sizes, confidence intervals, and meta-analyses [ 19 , 22 , 43 – 47 ] and Bayesian statistics [ 23 , 24 , 48 ] have been proffered. As Smith and Little [ 18 ] aptly observed, there has been an inadvertent demand for larger and larger samples in various journals as a matter of policy because of these efforts, to the detriment of the science we seek to advance particularly given the exemplary beneficial scientific features [ 45 ] of the alterative.

The alternative tradition has a long history in psychology, antedating the rise and eventual dominance of NHST in psychological research, namely, small- N experimental designs that some describe as N  = 1 or N -of-1 [ 25 , 38 , 49 – 51 ]. Deployed frequently in psychophysics [ 52 – 57 ], it has roots in Fechner’s earliest works (see [ 54 , 58 ]). The approach typically does not require a reliance on inferential statistics for evaluating data primarily because of its heavy reliance on experimental rather than the statistical control that is intrinsic to group designs ([ 51 , 59 – 61 ], see [ 62 ], for historical usage). Additionally, it has the unique characteristic of inherently requiring replications as a matter of course (see [ 17 , 18 , 27 ]). In this tradition, research may begin with a formal hypothesis not driven by statistical considerations (see [ 45 ]) or with an informal hunch about some functional relationship between independent and dependent variables. What drives the outcome is the rigor of experimental control used in demonstrating such functional relationships for the same subjects by repeated exposure to various conditions (intrasubject replication), between different subjects exposed to similar conditions (intersubject replication), or across settings, situations, species, etc. In so doing, it establishes not only a strong internal validity but also generality of the effects [ 18 , 51 , 63 ]. Primarily, evaluation of data is conducted typically with graphical depictions of patterns of change in the dependent variable of interest (see [ 61 , 64 ]), mostly relying on visual inspection of the data.

Although often credited with the founding of psychophysics [ 56 , 65 ], which also has been traditionally reliant on extensive studies of only few subjects, ironically, Fechner is also credited with introducing “statistical methods” to psychology in terms of what Stigler described as “probability-based modeling and inference” [ 18 , 58 ]) tends to rely mostly on the use of large- N group designs with their attendant complexities, whereas behavioral psychology tends to rely mostly on the small- N experimental designs [ 35 ]. The two areas of psychology tend to approach their subject matters reliably from different vantage points. Conceptually, for example, the subject of memory is characterized alternatively as remembering in behavior analysis to reflect long-standing recognition of the phenomenon as an action event, as opposed to a hypothetical construct (e.g., [ 66 ], see also [ 67 ], White and Wixted [ 56 ]). Now, because memory is construed typically as a cognitive phenomenon, one might expect, for sure, from a contemporary standpoint, that it would be studied using the standard cognitive methodology relying on large- N group designs.

The focus of the present study is the reliability of reproducing studies of memory presumably conducted from a cognitive perspective that did not historically rely on large- N group designs for the most part. In doing so, one hopes that the current crises on issues of replications and reproducibility of psychological phenomena [ 20 , 68 , 69 ] would illuminate the methodological issues involved. Ebbinghaus’ study of memory was prominent in Dukes’ [ 50 ] enumeration of important psychological reports that used N  = 1 research. The reliability of reproducible effects is the hallmark of empirical findings in science after all. Achieving field replications in Huffmeier et al.’s [ 70 ] replication typology provides such reliability for the memory phenomena reported here. As highlighted further below, Ebbinghaus’ memory work has had a long history of successful replicability. To be sure, there have been other important discoveries in psychology that derived from studies that did not rely on inferential statistics commonly used in large- N group designs [ 71 ]. The classic memory studies reviewed here appear to belong in the same caliber of studies. They cover three different important topics on memory: (1) immediate memory span, (2) chunking, and (3) levels of processing.

Classic memory studies

“Psychological knowledge is not acquired a priori – we cannot know in advance what will emerge as reliable findings without replicating initial findings.” [ 72

Findings from classic memory experiments on immediate memory span (e.g., [ 73 – 75 ], see [ 76 ]), chunking [ 74 ], and level of processing (e.g., [ 77 ]), have had long-standing impact on our understanding of memory processes in psychology. Fifteen of the 20 articles (75%) cited in Miller’s [ 74 ] review that culminated in the magical number seven were published in the 1950s, only three (15%) from the 1930s, and one each from 1945 and 1904 (10%). But for the 1904 citation, the seminal works of Guilford and Dallenbach [ 73 ] and Oberly [ 75 ] on immediate memory spans that informed Miller’s review antedated all of these works. What is noteworthy about the two earliest works is that they studied memory processes using experimental designs devoid of statistical inferences. Guilford and Dallenbach’s study, for example, was “an intensive study upon a few Ss, and extensive study upon a large class” [ 73 , 75 ] worked with seven participants presented with 2–14 digits whose memory spans ranged from 6 to 14. Oberly’s extensive study involved 100 participants presented with 4–12 digits either randomly or in sequence yielding memory spans of 8.9 each. Notably, again, Oberly did not deploy inferential statistics; indeed, the remaining narrative and discussions by Oberly following the presentation of the group data focused largely on the verbal reports of the seven individual participants.

On the topic of immediate memory spans, Miller’s reviews of absolute judgement of unidimensional and multidimensional stimuli concluded, that “[t]here is a clear and definite limit to the accuracy with which we can identify absolutely the magnitude of a unidimensional stimulus variable,” which he specified to be “in the neighborhood of seven” [ 74 chunks , which he argued could be circumvented by processes involving “recoding” in which we may construct “larger and larger chunks, each chunk containing more information than before” [ 73 , 74 ] and Oberly [ 75 ], Miller’s review included some works that did not employ inferential statistics either. Pollack [ 78 ], for example, studied verbal learning by 25 participants and reported their mean data, without any apparent appeal to inferential statistics for interpretation of the outcomes. Carmichael et al. [ 79 ] was another one of the papers reviewed to support the influence of naming on visual perception. In their study, they presented visual images with two lists of labels for various objects to 48 and 38 experimental participants, respectively, and 9 control participants who got no names. The analyses and interpretation of the results involved no statistical treatments at all. Thus, even the studies that employed large number of participants did not resort to inferential statistics to make sense of the data.

As Murray’s historical analyses of the influence of nineteenth century memory research concluded, modern memory research topics such as level of processing [ 77 ] have some connection to nineteenth century work on memory. Much of the research on memory from the era were notably of the small- N variety in the tradition of Ebbinghaus’s groundbreaking self-experimentation with nonsense syllables (e.g., [ 80 ]) as were those from early twentieth century [ 81 – 84 ]. Even modern replications of Ebbinghaus have stayed true to the tradition (e.g., [ 85 , 86 ]. Kirkpatrick [ 87 ] conducted memory experiments with large numbers of students, but still did not rely on inferential statistics for interpretation of the results. By the time of the publication of Miller’s paper in 1956, the use and reporting of group designs and inferential reporting of p values in psychological research had just reached its peak [ 25 , 62 , 88 ] having virtually replaced critical ratios and probable errors that were prevalent when the Guilford and Dallenbach [ 73 ] and Oberly [ 75 ] papers had appeared. Predictably, then, much of the work reviewed by Craik and Lockhart [ 77 ] derived from mainstream psychology research that emphasized reporting of NHST. In their paper, Craik and Lockhart advocated meaningful, deeper processing as an aid to retention of information. Moscovitch and Craik [ 89 ] provided some empirical evidence in support of the depth of processing view of memory. They reported three experiments demonstrating better recall with meaningful sentences than with rhymes using large number of participants and NHST analyses and interpretation of the data. In the same vain, four other studies reanalyzed in light of the depth of processing notion all used a large number of participants and inferential statistics to report their findings [ 90 – 94 ]. The exclusive reliance of NHST in relevant level of processing research then reflects its widespread adoption in mainstream psychology.

Recent experimental replications of Ebbinghaus’ memory experiments and the use of savings have variously stayed true to his methods (e.g., Murre and Dros [ 85 ]) by using small number of participants, using syllables as stimuli, and using the method of savings as the primary dependent variable. Murre and Dros provide the most modern replication of Ebbinghaus’s memory experiments that stayed close to his approach. Even they, however, succumbed to the analytical zeitgeist by occasionally reporting NHST in their data analyses, perhaps reflecting Ebbinghaus’ tendencies for methodological eclecticism [ 95 ]. Their results though, notably, confirmed Ebbinghaus’, supporting the robustness of the generality of the memory phenomena explored. The versatility of the subject matter combined with the rigor of the methodology used in the original classic studies makes it conducive to examine the recent contemporary problems of replication and reproducibility of findings afflicting psychological science. The replicability of the Ebbinghaus memory phenomena (Murre and Dros 2017) illustrates the point. The classic memory studies of immediate memory span, chunking, and levels of processing offer additional lines of evidence for demonstrating generality of effects reported using methods other than those widely employed in mainstream psychology today. Collectively, they have withstood the test of reproducibility having been reliably reproduced well within experimental laboratory preparations. The latter classic cases, the subjects of the present report, particularly provide an opportunity to explore the extent of the generality supported by their largely small- N experimental roots. The opportunity is not one of a prospective study of these memory phenomena, however.

Many introductory textbooks provide demonstration activities (e.g., [ 96 ]) on these phenomena for the classroom. Three memory exercises on immediate memory span, chunking π, and depth of processing comprised the retrospective examination of results from classroom demonstration activities conducted between 2013 and 2019 in various introduction to psychology courses including special sections on social justice. The activities reflect specific attributes of the classic studies discussed above, all being cognitive processes that would not be considered appropriately studied with the original methodologies in today’s psychology. They also shared in common that completing these exercises involved quantitative data collected at the time of the demonstrations. Classroom demonstrations, of course, occur in environments unlike the laboratories that produced the original experiments establishing these phenomena. If under such uncontrolled environments they succeed in reproducing the expected effects, they further attest to the robustness of the original findings, provide ecological extensions of those findings, and present interesting implications for our understanding of the experimental design and analyses deployed for their original empirical reporting in contemporary context. What follows describes the procedures used to collect the retrospective data in various classrooms.

Undergraduate students enrolled in introductory psychology courses over multiple semesters and across many years from 2013 to 2019 participated in classroom memory demonstrations. They were typical, mostly freshman students from a predominantly white private Catholic university in the US Midwest. Table 1 shows the activities for which data were collected including those from introductory psychology classes with social justice themes. Of the three activities, namely, immediate memory span, chunking π , and depth of processing, data on immediate memory span was limited to the fall of 2017 through fall of 2019. Each activity was implemented using the instructions provided in the instructor’s materials (IM) that accompanied Bernstein’s [ 96 ] introductory psychology textbook along with the materials and display items for each demonstration:

Immediate memory span (IMS) exercise

The immediate memory span exercise was Activity #1 on short-term memory of Supplement 8.10 in the IM that accompanied Bernstein [ 96 ]. The stimuli were 10 series of digits starting with three digits and ending with 12 digits, each series increasing by one digit.

Students saw the numbers displayed one digit-at-a-time with increasingly longer number of digits in each subsequent series. At the end of each series, following a very brief pause, they wrote down the digits in sequence. After all the series have been presented individually, students saw all the digits in all series at once to check against their written series, and then determined their IMS from the one preceding the series with their first error. Headcount of their span followed, with a discussion of 7 ± 2 capacity of short-term memory.

Chunking π exercise

The chunking exercise appeared in Supplement 8.11 of Bernstein’s [ 96 ] IM. According to the instructions, students saw 20 digits of π on the screen to examine briefly. They then wrote down as many digits of π they could remember after a distraction task. Headcount of students remembering digits from 20 to 1 followed (Before). The digits then were displayed, grouped to accompany a story narrated to the class. Following the distraction task again, students wrote down as many digits of π they could remember. Another headcount for digits recalled followed (After), with display of tallies and discussion of chunking and the roles of meaningful processing.

Depth of processing (DoP) exercise

The depth of processing exercise provided implementation instructions and the accompanying task instructions for students in Supplement 8.14 to illustrate the level of processing model of memory. The exercise began by dividing the class into two groups, A and B. One group received instructions to count vowels (maintenance rehearsal) and the other to find usefulness on an island (elaborative rehearsal) in words read aloud to the class. The respective instructions were displayed on the screen. When Group A received its instructions, Group B had eyes closed and vice versa. The list of words included 22 words ranging from umbrella to bottle. Following a distraction task that lasted about 30 s during which students wrote down their name, address, phone number, major, and social security number, they wrote as many words from the list as they could remember. Headcount of how many words remembered by each group and a subsequent discussion of levels of processing then followed.

The data reported were all count data collected by show of hand in the classroom. If the memory span activity is successful, students would remember mostly between 5 and 9 items, inclusive, as predicted based on the classic studies on the topic (see [ 74 ]). Success in the chunking exercise entails students remembering more digits of π after the meaningful story than before it. Because the putative data derived from head counts in the present study, recalled items could not be matched before and after for each student as would be customary in a laboratory version of the study using a small- N design. Chunking predicts remembering more digits of π due to recoding into larger units [ 74 ], in the present case, students should remember more digits, accordingly. Finally, success in the DoP activity is reflected in the students who received instructions for maintenance rehearsal remembering less than those instructed for elaborative rehearsal in accord with level of processing theory of memory [ 77 ].

Figure  1 presents data from 11 sections during five semesters starting from 2017 through 2019 on immediate memory span. Each graph presents a semester’s data from each section of introductory psychology including the last two showing those of the special sections on social justice (PS100). Figure  1 shows that most students remembered items more within the 7 ± 2 span in each semester indicated by the colored bars. Whereas most sections, 7 of 11 sections (64%), recorded students below the 7 ± 2 span, only two (FA 2017A and FA 2019) did so above the span representing 18%. Incidentally, the two sections recording students above the span were among the sections with students below the span; FA2017A recorded 4 below and 1 above, whereas FA2019 recorded 1 below and 2 above.

figure 1

Number of students who remembered a given number of items in an immediate memory span demonstration exercise in introductory psychology courses across 11 sections in 5 semesters from 2017 to 2019. Letters A and B represent different sections of the course in the same semester and green bars reflect data within 7 ± 2

Figure  2 presents the before and after counts of students who remembered π to the 20th digit across nine semesters from 2013 through 2019 in 14 sections. Data before the story were not available for three semesters, spring of 2014 and 2015, as well as fall of 2019. As such, adequate comparisons are possible for only 11 of 14 sections. The figure shows that in all semesters where comparisons are possible, students tended to remember more digits of π following the narrated, meaningful (albeit arbitrary) story that accompanied the digits (shown in red in the figure) than before the story (shown in blue). Visual inspection of the graphs reveals the effect in two different ways. First, there were higher peaks in the number of students remembering the digits of π after (peaks were at 20th digit, except for SP2014 at 7th digit and SP2017A at 17th digit) than before (peaks were between the 7th and 12th digits across the sections) the story. That is, in 12 of 14 cases (86%), more students recalled the 20 digits (indicating later peaks) after the meaningful story, in contrast to before it. Second, there were rightward shifts in the overall number of students remembering π after the story compared to before it. In the three sections without the before data, students tended to remember more digits comparable to those of the other 11 sections with before data.

figure 2

Number of students who remembered digits of π as a function of number of digits remembered before (blue circles) and after (red squares) hearing an arbitrary story containing digits of π to the 20th digit in introductory psychology courses across 14 sections in 9 semesters from 2013 to 2019. Letters A and B represent different sections of the course in the same semester

Finally, Fig.  3 presents the number of students who remembered list items following a maintenance rehearsal task compared to an elaborative rehearsal task. The data presented were from 14 semesters starting fall of 2013 through 2019 in 17 sections, each graph representing a section’s data for each semester. Visual inspection of the figure shows that, in each section, students remembered more words when instructed to find how the list items could be useful to them when stranded on an island (elaborative rehearsal; in red) than to count the vowels in the words read to them (maintenance rehearsal; in green). The rightward shifts in the student distributions with elaborative rehearsal is indicative of this effect; there was a lone student in the SP2019A section who remembered more with maintenance rehearsal than students who used elaborative rehearsal. That is, the effect occurred in 94% of the sections.

figure 3

Number of students remembering items as a function of number of items remembered following maintenance (green bars) or elaborative (red bars) rehearsals in a depth of processing demonstration exercise in introductory psychology courses across 17 sections in 14 semesters from 2013 to 2019. Letters A and B represent different sections of the course in the same semester

Each set of results from the memory span, chunking, and depth of processing showed discernible patterns across the semesters that generally were outcomes consistent with the findings of the original memory experiments in psychology. In each case, the graphical presentations sufficiently depicted the various effects primarily by visual inspection and therefore required no inferential statistical analysis to understand the effects. In experimental data, we seek regularities, in exception to irregularities [ 51 ]. In each activity in the present study, amidst any variability in counts, the pertinent data displayed outcomes in line with the previous classic studies, most students remembered 5–9 items, students remembered more digits of π after than before the meaningful story, and students remembered more with deep processing than with superficial processing.

Methodological legacy

“It is possible that, in several fields of psychological science, the current dominant paradigm when replication is attempted is that of perpetuated fallacies. Replication efforts, rare as they are, are done primarily by the same investigators who propose the original discoveries.” [ 2

This is not the case in these memory phenomena, even under uncontrolled environments. First, in determining the immediate memory span, students remembered items within the magical 7 ± 2 range in each semester. Each section from each semester represented an independent replication. As such, there were 11 of 11 (100% of the sections) successful replications of this effect; success signified by the number of students remembering 5–9 items (see Fig.  2 ). Although 64% of the cases recorded occasional spans below Miller’s [ 74 ] minimum of five, they did not rise to the same level of evidensory support for a memory span of four suggested by Cowan [ 97 , 98 ] (cf. [ 62 , 99 – 102 ]). Second, wherever possible (in 79% of the sections), there were rightward shifts in the number of students remembering more digits of π after compared to before the meaningful story was attached to π digits; in two of the three sections without the before data, the rightward shifts peaked at the 20th digit. Altogether, then, there were 14 rightward shifts and peaks at the 20th digit in students recall of π ; that represents 14 replications of the positive effects of attaching meaningful story to the 20 digits of π . Finally, for every semester, students remembered more following elaborative than following maintenance rehearsal. There were 17 sections showing the effect, representing 17 successful replications (i.e., 100%).

These results collectively are indicative of the robustness of the respective phenomena demonstrated; they established the validity of the outcomes of historically important psychological findings on memory span, chunking, and depth of processing [ 73 – 75 , 77 ], replicated under uncontrolled classroom environments. They each were discernible by visual inspection without statistical inference. Most students remembered 5–9 items (Fig.  1 ), students remembered more digits of π after hearing a linked story (Fig.  2 ), and tended to remember better with meaningful processing than with superficial processing (Fig.  3 ). Note that the variability in students remembering π digits is present both before and after the linked story, suggesting varied knowledge of π digits among the students coming into the class exercise. Furthermore, the results corroborate the relevance of historical small- N methodology for the study of cognitive processes that otherwise would be considered appropriately studied using group-design methodology in today’s psychological research world. Finally, by providing “real-world” ([ 70 , 103 ]) extensions, they support the generality of these classic reports of memory phenomena from the standpoint of the second research tradition of psychology noted in the introduction. In that tradition:

Contrary to what is usually assumed about the small- N experimental approach, namely, that it lacks generality due to the sample size that is usually small compared to what is typical in the alternative group-design approaches, generality is of paramount interest and is usually accounted for in behavior analytic research. Replication is what affirms generality, especially of the type sought after by mainstream psychologists. ([ 18 , 25 , 27 ])

Pedagogic and methodological implications and historical antecedents.

“Significance testing never makes a useful contribution to the development of cumulative science.” [ 33

In light of the ongoing replication crisis in psychology (e.g., [ 5 , 6 ]), the results of this report are worthy of note for both pedagogic and research purposes. Pedagogically, they illustrate the value of such in-class activities in demonstrating psychological phenomena that have a firm foundation of empirical reproducibility, much like using physics demonstration experiments to illustrate established physical principles (e.g., [ 104 ]). Indeed, if they were not so firmly established, they would be deficient as activities for demonstrating psychological principles because they would be vicariously haphazard and unpredictable and therefore unworthy as classroom demonstrations. As Poling et al. [ 64 ] pointed out, “[ i]n science, repeatability is tantamount to believability . Relations that can be reproduced are accepted as real; those that cannot be reproduced are rejected” ([ 64 , 96 ] IM).

For research purposes, the history of the entrance and ascendance of inferential statistics into psychology is illuminating. The actual coupling of psychological research and statistical inference [ 58 , 105 ] defined the path that separated mainstream psychology and behavior analysis [ 25 ] leaving the former with and the latter without a pervasive replication problem (see [ 18 ] for similar case made for vision research). As mentioned in the introduction, replications tend to be associated naturally with small- N designs [ 51 , 63 , 106 ]. According to Stigler, following Pierce’s adoption of “randomization to create an artificial baseline…, Fechner’s control of experimental conditions, like that of Muller, Wundt, and Ebbinghaus, created an artificial baseline and a framework that made statistical investigation possible. Psychology has never been the same since” [ 58 , 61 , 64 ]).

The issues and problems introduced by the wholesale embrace of NHST in psychology seem not to have been necessary for a productive scientific endeavor to create a cumulative science (see [ 39 , 107 ]) prior to the coupling of inferential statistics and research in psychology. Hubbard and Ryan’s [ 88 ] findings on APA journals’ reporting of inferential statistics showed how empirical research before the 1910s in psychology did not rely on statistical inference to make sound decisions about psychological results. Boring’s [ 62 ] report of “experimental control” in the American Journal of Psychology followed a similar historical pattern, with increasing use of control groups or comparisons from the mid-1910s through the early 1950s following the rise of NHST. Indeed, most, if not all, discovery of foundational principles occurred under experiments conducted without statistical tools of the sorts in widespread use in psychological research today. One has to ask what the benefit is for introducing these tools: Are new discoveries better because of their introduction? Developments that retard progress are not worth having (see [ 2 ]). We should adopt and embrace tools because they make our march towards a cumulative science possible, not because they make doing the science convenient for us, as NHST does.

Implications of small- N designs and visualization of effects

Others have noted the important role expert judgements play in doing science (see, e.g., [ 51 , 108 , 109 ]) and advocated for their use in psychological research [ 25 ]. Applying expert judgement may not be convenient and “quick” to the task of getting a manuscript out in a timely manner, but applying the dichotomous, yes or no, answer that NHST affords certainly is (see [ 110 , 111 ]). When conditions change, such as when an elaborative-rehearsal task as opposed to a maintenance-rehearsal task, precedes the memory test of a previously encountered learning material, the perceptible difference in recall of the material can be visualized readily even without expertise. This was the case as the students did following the depth of processing demonstration, see Fig.  3 for the graphical shifts in items recalled in that exercise. Graphical visualization is a recommended best practice [ 19 , 49 ], see also [ 112 ] at any rate, and its use in decision-making can be trained (e.g., [ 113 – 116 ]). Expertise in use of visual inspection thus is demonstrably trainable. Nevertheless, as the earliest generations of psychological researchers have amply demonstrated (see [ 88 ]), extant research practices do not have to involve inferential statistics to be valuable and productive. Although the replication crises arose in the context of use of NHST, indeed, many of these pre-1910 studies were not memory studies and yet they reported findings without inferential statistics. They, thus, precluded the possibility that non-memory psychological phenomena could not be studied and reported without inferential statistics like the classic memory studies replicated in the present study.

As noted in the introduction, Ebbinghaus’ study of memory was one of the important psychological reports that used N  = 1 research [ 50 ]. Psychology’s early and later history is replete with such a research approach [ 71 ] that did not involve the use of significance testing at all. Classic discoveries in psychology other than the psychophysical ones mentioned above such as Fechner’s (e.g., see [ 54 , 65 ]) that did not use a t - or F -test nor report any p values, or even confidence intervals or Bayes factors are numerous. Among the works so identified by Gigerenzer and Marewski are Jean Piaget’s child development stages (see, e.g., [ 117 ]), Wolfgang Köhler’s ape intelligence [ 118 ] and his Gestalt laws of perception [ 119 ], Ivan P. Pavlov’s principles of classical conditioning (see [ 120 ]), B. F. Skinner’s principles of operant conditioning (see [ 121 ]), George Miller’s magical 7 ± 2 (see [ 74 ]), and Herbert A. Simon’s Nobel Prize–winning work in economics. Over and above the “methodological eclecticism” in the pursuit of measurement precision that allowed Ebbinghaus to achieve such acclaim in the study of memory [ 95

All of these characteristics attest to the possibility of a psychological science conducted without the use of group design and/or NHST. Piaget, for example, reported hundreds of detailed vignettes of cases to illustrate, demonstrate, or support his theories of development without ever adopting an experimental design that involved groups of children [ 117 , 122 ]. Despite his oppositions to the behaviorism of his day, largely on opposing views on the epistemological status of objective reality and personal experience arising from respective positions on introspection, Köhler was sympathetic to Watson’s use of qualitative observations of children and objected to what he called the “quantitative method” that required statistical analysis of data. As he retorted, “[e]verything that is valuable in these observations would disappear if ‘results’ were handled in an abstract statistical version” [ 119 N experimental designs, which are distinct from the prevalent mainstream group designs and NHST.

What is to become of psychology?

“A student can complete our graduate program without learning anything at all about basic learning processes, or basic sensory and perceptual processes, or memorial and cognitive processes, or basic developmental processes, or social processes, or approaches to personality, and so on. Students, as in most graduate programs, can pick and choose among a few courses on those (and other) topics to provide them presumed breadth. But the only training every student must have is in NHST… this state of affairs has developed because of the reliance on NHSTs as the dominant method for analyzing data and deciding if results merit publication, thus retarding the development of cumulative, evolving, integrated knowledge.” [ 39 “The experimental means for groups of adults generally range from about 3–5 chunks, whereas individual subject means range more widely from 2 to 6 chunks.” [ 97

Can psychology be defined as the study of average behavior and mental phenomena as opposed to the now standard, study of behavior and mental processes (e.g., [ 96 ])? An alien looking in could, indeed, surmise that psychology is the study of the average person , not of processes (c.f., [ 18 , 51 ]), by the overwhelming reliance on group designs in contemporary psychological research, which continually yield reports of averages. Not all psychological phenomena are conducive to examination by group designs (in fact, many are not), however; just as human and nonhuman behavior tends to be an attribute of the individual, so are cognitions [ 18 , 34 ]. Surely, there are behaviors and cognitions that manifest as group phenomena, but most things that psychologists are interested in tend to be those of the individual. This is true even of social psychology. Social psychologists do not study average persons , but social influences and perceptions as variables that affect individual behavior and/or cognition. Phenomena like groupthink may be exceptions, and even then, the unit of interest is the group, not an average person .

Perhaps the best way, going forward, in initiating a research project is, first, to determine primarily if the phenomenon of interest is an attribute of the individual or a group process and only then, second, to choose an appropriate design that fits the phenomenon. A behavior and/or cognition that is fundamentally of/about the individual is better studied with designs that appropriately answers questions about the individual and not about the average person or animal (it is possible, I guess, to be interested in the average person or animal per se, in which case the appropriate design of choice would be the group design). A recent report on altruism in rodents [ 123 ] is illustrative. There have been questions on whether rodents engage in prosocial behavior for empathetic or altruistic reasons (e.g., [ 124 , 125 ]) or for social-contact reasons (e.g., [ 126 ]), a presumably social albeit biologic behavior. It took a systematic replication with small- N experimental design and reconfiguration of equipment and of the prevailing economy of the test environment to seek out the controlling variables in what appears to be a case of altruism prima facie (see [ 123 ]). Refocusing the question informed the methodology deployed, which yielded ostensibly greater scientific clarity.

Finally, even Sidman is on record for saying that actuarial and other social or policy matters may actually require the use of and reliance on statistics (see [ 51 , 127 ]). It is therefore only a matter of perceptive choice of methodology tailored appropriately to a research question on the behavior and/or cognition of the individual or of the group. The works of Guilford and Dallenbach [ 73 ] and Oberly [ 75 ] on immediate memory span described above are illustrative in combining features of small- N (in their intensive parts) and large- N (in their extensive parts) in the same studies, even without the aid of inferential statistics to grasp the meaning and interpretation of their results. In pointing out that endorsements of small- N designs is not a one-size-fits-all proposition, Smith and Little made a case for accommodating both small- and large- N : “When the goal is to estimate population parameters,…then the recommendation to increase sample size at the participant level is an appropriate one” ([ 18 , 95 ] and Colling and Szucs’ “ pragmatic pluralism ” in calling for the adoption of both frequentist and Bayesian inferential approaches in psychological research. According to Colling and Szucs, “statistical reform is necessary because it is necessary to have the right tool for the right job in a complete system of scientific inference ” [ 21 N design as was possible with memory (e.g., [ 73 , 75 ]) in this case, for example, one simply adopts the appropriate design and the relevant statistical analyses. Such a methodological position is similar, at least in spirit, to Holtz’s [ 110 ] recommendations for epistemological solutions to the ongoing crises of confidence in psychology. As Smith and Little put it, “[i]n environments that can be explored at the individual level and the phenomenon of interest is expressed as an individual-level mechanism, small- N studies have enormous inferential power and are preferable to group-level inference precisely because they place the burden of sampling at the appropriate level, that of the individual” ([ 18 , 27 ]).

The ongoing crises of confidence in psychology have been attributed variously to a collection of related factors in the practice of our science. The attributions need not be of one-track solution focused mainly at research practices of only one of psychology’s long-established traditions, however. The results reported here are remarkable and noteworthy in validating these historically important psychological findings outside of the laboratory. They are testament to the reliability of those reproducible effects.

What we have today is a divided attention to inferential statistical considerations of only one of psychology’s research traditions, with outright neglect of the other well-nourished and empirically productive alternative. Rather, what is required is a more pragmatic approach of considered attention to the research question, to the selection of appropriate research design and analyses, and to informed theoretical framework in which to situate properly our understanding of the outcomes. This position is neutral to the question of whether psychology’s crises of confidence arose from the statistical tool-user or the tool itself, alluded to above, so long as the research question drives the informed choice of the design and the educated use of the relevant tools, statistical and otherwise. The choice of designs and the appropriate statistical and/or other tools are, of course, in the purview of expert judgement [ 25 , 51 ] exercised by the researcher in his/her research domain.

Availability of data and materials

All data generated or analyzed in this study are included in this published article. The raw datasets used and analyzed during the study are available from the author on reasonable request.

Abbreviations

Questionable research practices

Null hypothesis statistical testing

Law of large numbers

Statistical significance testing

Statistical hypothesis testing

American Psychological Association

Instructor’s materials

Immediate memory span

Depth of processing

Hanin L. Cavalier use of inferential statistics is a major source of false and irreproducible scientific findings. Mathematics. 2021;9:603. https://doi.org/10.3390/math9060603 .

Article   Google Scholar  

Ioannidis JPA. Why science is not necessarily self-correcting. Perspect Psychol Sci. 2012;7:645–54. https://doi.org/10.1177/1745691612464056 .

Article   PubMed   Google Scholar  

Chung S, Fink EL. One of the most cited persuasion studies but no success in replication: investigating replication using Petty, Cacioppo, and Goldman (1981) as an example. Ann Int Commun Assoc. 2018;42:1–20. https://doi.org/10.1080/23808985.2018.1425100 .

Bem DJ. Feeling the future: experimental evidence for anomalous retroactive influences on cognition and affect. J Personal Soc Psychol. 2011;100:407–25. https://doi.org/10.1037/a0021524 .

Ritchie SJ, Wiseman R, French CC. Failing the future: three unsuccessful attempts to replicate Bem’s retroactive facilitation of recall effect. PLoS ONE. 2012;7(3):e33423. https://doi.org/10.1371/journal.pone.0033423 .

Article   PubMed   PubMed Central   Google Scholar  

Cesario J. Priming, replication, and the hardest science. Perspect Psychol Sci. 2014;9:40–8. https://doi.org/10.1177/1745691613513470 .

Ferguson MJ, Carter TJ, Hassin RR. Commentary on the attempt to replicate the effect of the American flag on increased Republican attitudes. Soc Psychol. 2014;45:299–311. https://doi.org/10.1027/1864-9335/a000202 .

Klatzky RL, Creswell JD. An intersensory interaction account of priming effects—and their absence. Perspect Psychol Sci. 2014;9:49–58. https://doi.org/10.1177/1745691613513468 .

Klein RA, Ratliff KA, Vianello M, Adams RB, Bahnik S, Bernstein MJ, Nosek BA. Investigating variation in replication: a “many labs” replication project. Soc Psychol. 2014;45:142–52. https://doi.org/10.1027/1864-9335/a000178 .

Spellman BA. A short (personal) future history of revolution 2.0. Perspect Psychol Sci. 2015;10:886–99. https://doi.org/10.1177/1745691615609918 .

Holland SM. Estimation, not significance. Paleobiology. 2019;45:1–6. https://doi.org/10.1017/pab.2018.43 .

McManus E, Turner D, Sach T. Can you repeat that? Exploring the definition of a successful model replication in health economics. PharmcoEconomics. 2019;37:1371–81. https://doi.org/10.1007/s40273-019-00836-y .

Roloff J, Zyphur MJ. Null findings, replication and preregistered studies in business ethics research. J Bus Ethics. 2019;160:609–19. https://doi.org/10.1007/074193251661116 .

Wohl MJA, Tabri N, Zelenski JM. The need for open science practices and well-conducted replications in the field of gambling studies. Int Gamb Stud. 2019;19:369–76. https://doi.org/10.1080/14459745.2019.1672769 .

Vermeuhen I, Beukeboom CJ, Batenburg A, Avramiea A, Stoyanov D, van de Velde B, Oegema D. Blinded by the light: how a focus on statistical ‘significance’ may cause p-value misreporting and an excess of p-values just below.05 in communication science. Commun Methods Meas. 2015;9:253–79. https://doi.org/10.1008/19312458.2015.1096333 .

Simmons JP, Nelson LD, Simonsohn U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci. 2011;22:1359–66. https://doi.org/10.1177/0956797611417632 .

Little DR, Smith PL. Replication is already mainstream: lessons from small- N designs. Behav Brain Sci. 2018;41:141. https://doi.org/10.1017/S0140525X18000766 .

Smith PL, Little DR. Small is beautiful: in defense of the small- N design. Psychon Bull Rev. 2018;25:2083–101. https://doi.org/10.3758/s13423-018-1451-8 .

Cumming G. The new statistic: Why and how? Psychol Sci. 2014;25:7–29. https://doi.org/10.1177/0956797613504966 .

Pashler H, Wagenmakers E. Special section on replicability in psychological science: A crisis of confidence? Perspect Psychol Sci. 2012;7:528–30. https://doi.org/10.1177/1745691612465253 .

Colling LJ, Szucs D. Statistical inference and the replication crisis. Rev Philos Psychol. 2021;12:121–47. https://doi.org/10.1007/s13164-018-0421-4 .

Cumming G, Fidler F. Confidence intervals: better answers to better questions. J Psychol. 2009;217:15–26. https://doi.org/10.1027/0044-3409-217.1.15 .

Kruschke JK, Liddell TM. The Bayesian new statistics: hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychon Bull Rev. 2018;25:178–206. https://doi.org/10.3758/s13423-016-1221-4 .

Wagenmakers E-J. A practical solution to the pervasive problems of p values. Psychon Bull Rev. 2007;14:779–804. https://doi.org/10.3758/BF03194105 .

Imam AA. Historically recontextualizing Sidman’s Tactics : how behavior analysis avoided psychology’s methodological Ouroboros. J Exp Anal Behav. 2021;115:115–28. https://doi.org/10.1002/jeab.661 .

Hurtado-Parrado C, Lopez-Lopez W. Single-case research methods: history and suitability for a psychological science in need of alternatives. Integr Psychol Behav Sci. 2015;49:323–49. https://doi.org/10.1007/s12124-014-9290-2 .

Normand MP. Less is more: psychologists can learn more by studying fewer people. Front Psychol. 2016;7:934. https://doi.org/10.3389/fpsyg.2016.00934 .

Falk R, Greenbaum CW. Significance tests die hard: the amazing persistence of a probabilistic misconception. Theory Philos. 1995;5:75–98. https://doi.org/10.1177/0959354395051004 .

Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2:0696–701. https://doi.org/10.1371/journal.pmed.0020124 .

Morrison DE, Henkel RE, editors. The significance test controversy: a reader. London: Aldine; 1070.

Google Scholar  

Nickerson RS. Null hypothesis significance testing: a review of an old and continuing controversy. Psychol Methods. 2000;5:241–301. https://doi.org/10.1037/1082-989X.5.2.241 .

Rozeboom WW. The fallacy of null hypothesis significance test. Psychol Bull. 1960;57:416–28. https://doi.org/10.1037/h0042040 .

Schmidt FL, Hunter JE. Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In: Harlow LL, Mulaik SA, Steiger JH, editors. What if there were no significance tests? Hillsdale: Lawrence Erlbaum; 1997. p. 37–64.

Grice J, Barrett P, Cota L, Felix C, Taylor Z, Garner S, Medellin E, Vest A. Four bad habits of modern psychologists. Behav Sci. 2017;7:53–83. https://doi.org/10.3390/bs7030053 .

Imam AA, Frate M. A snapshot look at replication and statistical reporting practices in psychology journals. Eur J Behav Anal. 2019;20:204–29. https://doi.org/10.1080/15021149.2019.1680179 .

Schneider JW. Null hypothesis significance tests. A mix-up of two different theories: the basis for widespread confusion and numerous misinterpretations. Scientometrics. 2015;102:411–32. https://doi.org/10.1007/s11192-014-1251-5 .

Lambdin C. Significance tests as sorcery: science is empirical—significant tests are not. Theory Psychol. 2012;22:67–90. https://doi.org/10.1177/0959354311429854 .

Bernard C. An introduction to the study of experimental medicine. Dover Publications Inc; (1927/1957).

Branch M. Malignant side effects of null-hypothesis significance testing. Theory Psychol. 2014;24:256–77. https://doi.org/10.1177/0959354314525282 .

Harlow LL, Mulaik SA, Steiger JH, editors. What if there were no significance tests? Hillsdale: Lawrence Erlbaum; 1997.

Gandevia S, Cumming C, Amrhein V, Butler A. Replication: do not trust your p-value, be it small or large. J Physiol. 2021;599:2989–90. https://doi.org/10.1113/JP281614 .

Spellman BA. Special section on research practices. Perspect Psychol Sci. 2012;7:655–89. https://doi.org/10.1177/1745691612465075 .

Barry AE, Valdez D, Goodson P, Szucs L, Reyes JV. Moving college health research: reconsidering our reliance on statistical significance testing. J Am Coll Health. 2019;67:181–8. https://doi.org/10.1080/07448481-2018-1470091 .

Estes WK. On the communication of information by displays of standard errors and confidence intervals. Psychon Bull Rev. 1997;4:330–41. https://doi.org/10.3758/BF03210790 .

Schmidt FL, Hunter JE. Are there benefits from NHST? Am Psychol. 2002;57:65–6. https://doi.org/10.1037/0003-066X.57.1.65 .

Tryon WW. Replication is about effect size: comment on Maxwell, Lau, and Howard (2015). Am Psychol. 2016;71:236–7. https://doi.org/10.1037/a0040191 .

Watson JC, Lenz AS, Schmit MK, Schmit EL. Calculating and reporting estimates of effect size in counseling outcomes research. Couns Outcome Res Eval. 2016;7:111–23. https://doi.org/10.1177/2150137816660584 .

Dienes Z. How Bayes factors change scientific practice. J Math Psychol. 2015;72:78–89. https://doi.org/10.1016/j.jmp.2015.10.003 .

American Psychological Association. Publication manual of the American Psychological Association: the official guide to APA style. 7th ed. Washington: APA; 2020.

Dukes WF. N = 1. Psychol Bull. 1965;64:74–9. https://doi.org/10.1037/h0021964 .

Sidman M. Tactics of scientific research: evaluating experimental data in psychology. Authors Cooperative; 1960.

Harrison JM, Turnock MT. Animal psychophysics: improvements in the tracking method. J Exp Anal Behav. 1975;23:141–7. https://doi.org/10.1901/jeab.1975.23-141 .

Krantz JH. Psychophysics. In: Experiencing sensation and perception (Chapter 2) (n.d.). https://psych.hanover.edu/classes/sensation/chapters/Chapter%202.pdf .

Krantz JH. Psychophysics. In: Davis SF, Buskist W, editors. 21st Century psychology: a reference handbook. Thousand Oaks: Sage Publications; 2008. p. 177–86. https://doi.org/10.4135/9781412956321.n20 .

Chapter   Google Scholar  

Read JCA. The place of human psychophysics in modern neuroscience. Neuroscience. 2015;296:116–29. https://doi.org/10.1016/j.neuroscience.2014.05.036 .

White KG, Wixted JT. Psychophysics of remembering. J Exp Anal Behav. 1999;71:91–113. https://doi.org/10.1901/jeab.1999.71-91 .

Blakemore C, Sutton P. Size adaptation: a new aftereffect. Science. 1969;166:245–247.

Stigler SM. A historical view of statistical concepts in psychology and educational research. Am J Educ. 1992;101:60–70. https://doi.org/10.1086/444032 .

Branch M. Statistical inference in behavior analysis: some things significance testing does and does not do. Behav Anal. 1999;22:87–92. https://doi.org/10.1007/BF03391984 .

Perone M. Statistical inference in behavior analysis: experimental control is better. Behav Anal. 1999;22:190–116. https://doi.org/10.1007/BF03391988 .

Saville BK. Single-subject designs. In: Davis SF, Buskist W, editors. 21st Century psychology: a reference handbook. Thousand Oaks: Sage Publications; 2008. p. 80–92. https://doi.org/10.4135/9781412956321.n10 .

Boring EG. The nature and history of experimental control. Am J Psychol. 1954;7:573–89. https://doi.org/10.2307/1418483 .

Branch M. Lessons worth repeating: Sidman’s Tactics of Scientific Research. J Exp Anal Behav. 2021;115:44–55. https://doi.org/10.1002/jeab.643 .

Poling A, Methot LL, LeSage MG. Fundamentals of behavior analytic research. Plenum Press; 1995.

Book   Google Scholar  

Boring EG. The beginning and growth of measurement in psychology. Isis. 1961;52:238–57. https://doi.org/10.1086/349471 .

Catania AC. Learning. Austell: Sloan Publishing; 2007.

Bachelder BL, Delprato DJ. The simple memory span experiment: a behavioral analysis. Psychol Rec. 2017;67:423–33. https://doi.org/10.1007/s40732-017-0222-7 .

Ferguson CJ. “Everyone knows psychology is not a real science”: public perceptions of psychology and how we can improve our relationship with policymakers, the scientific community, and the general public. Am Psychol. 2015;70:527–42. https://doi.org/10.1037/a0039405 .

Francis G. Publication bias and the failure of replication in experimental psychology. Psychon Bull Rev. 2012;19:975–91. https://doi.org/10.3758/s13423-012-0322-y .

Huffmeier J, Mazei J, Schultze T. Reconceptualizing replication as a sequence of different studies: a replication typology. J Exp Soc Psychol. 2016;66:81–92. https://doi.org/10.1016/j.jesp.2015.09.009 .

Gigerenzer G, Marewski JN. Surrogate science: the idol of a universal method for scientific inference. J Manag. 2015;41:421–40. https://doi.org/10.1177/0149206314547522 .

Laws KR. Psychology, replication and beyond. BMC Psychology. 2016;4:30. https://doi.org/10.1186/s40359-016-0135-2 .

Guilford P, Dallenbach KM. The determination of memory span by the method of constant stimuli. Am J Psychol. 1925;36:621–8. https://doi.org/10.2307/1413916 .

Miller GA. The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol Rev. 1956;63:81–97. https://doi.org/10.1037/h0043158 .

Oberly HS. A comparison of the span of attention and memory. Am J Psychol. 1928;40:295–302. https://doi.org/10.2307/1414490 .

Murray DJ. Research on human memory in the nineteenth century. Can J Psychol Rev Can Psychol. 1976;30:201–20. https://doi.org/10.1037/h0082062 .

Craik FIM, Lockhart RS. Levels of processing: a framework for memory research. J Verb Learn Verb Behav. 1972;11:671–84. https://doi.org/10.1016/S0022-5371(72)80001-X .

Pollack I. Assimilation of sequentially encoded information. Am J Psychol. 1953;66:421–35. https://doi.org/10.2307/1418237 .

Carmichael L, Hogan HP, Walter AA. An experimental study of the effect of language on the reproduction of visually perceived form. J Exp Psychol. 1932;15:73–86. https://doi.org/10.1037/h0072671 .

Munsterberg H. Studies from the Harvard psychological laboratory (I): memory. Psychol Rev. 1894;1:34–60. https://doi.org/10.1037/h0068876 .

Henmon VAC. The relation between learning and retention and amount to be learned. J Exp Psychol. 1917;2:476–84. https://doi.org/10.1037/h0070292 .

Luh CW. The conditions of retention. Psychol Monogr. 1922;31:i–87. https://doi.org/10.1037/h0093177 .

Mibai S. The effects of repetitions upon retention. J Exp Psychol. 1922;5:147–51. https://doi.org/10.1037/h0070099 .

Sauer FM. The relative variability of nonsense syllables and words. J Exp Psychol. 1930;13:235–46. https://doi.org/10.1037/h0075309 .

Murre JMJ, Dros J. Replication and analysis of Ebbinghaus’ forgetting curve. PLoS ONE. 2015;10:e0120644. https://doi.org/10.1371/journal.pone.0120644 .

Tulving E. Ebbinghaus’s memory: What did he learn and remember? J Exp Psychol Learn Mem Cognit. 1985;11:485–90. https://doi.org/10.1037/0278-7393.11.3.485 .

Kirkpatrick EA. An experimental study of memory. Psychol Rev. 1894;1:602–9. https://doi.org/10.1037/h0068244 .

Hubbard R, Ryan PA. The historical growth of statistical significance testing in psychology—and its future prospects. Educ Psychol Meas. 2000;60:661–81. https://doi.org/10.1177/0013164400605001 .

Moscovitch M, Craik FIM. Depth of processing, retrieval cues, and uniqueness of encoding as factors in recall. J Verb Learn Verb Behav. 1976;15:447–58. https://doi.org/10.1016/S0022-5371(76)90040-2 .

Bobrow SA, Bower GH. Comprehension and recall of sentences. J Exp Psychol. 1969;80:455–61. https://doi.org/10.1037/h0027461 .

Hyde TS, Jenkins JJ. The differential effects of incidental tasks on the organization of recall of a list of highly associated words. J Exp Psychol. 1969;82:472–81. https://doi.org/10.1037/h0028372 .

Johnston CD, Jenkins JJ. Two more incidental tasks that differentially affect associative clustering in recall. J Exp Psychol. 1971;89:92–5. https://doi.org/10.1037/h0031184 .

Rosenberg S, Schiller WJ. Sematic coding and incidental sentence recall. J Exp Psychol. 1971;90:345–6. https://doi.org/10.1037/h0031559 .

Tresselt ME, Mayzner MS. A study of incidental learning. J Psychol. 1960;50:339–47. https://doi.org/10.1080/00223980.1960.9916451 .

Postman L. Hermann Ebbinghaus. Am Psychol. 1968;23:149–57. https://doi.org/10.1037/h0025659 .

Bernstein DA. Essentials of psychology. Wadsworth: Cengage Learning; 2010.

Cowan N. The magical number 4 in short-term memory: a reconsideration of mental storage capacity. Behav Brain Sci. 2000a;24:87–114. https://doi.org/10.1017/S0140525X01003922 .

Cowan N. Metatheory of storage capacity limits. Behav Brain Sci. 2000b;24:154–85. https://doi.org/10.1017/S0140525X0161392X .

Bachelder BL. The magical number 4 = 7: span theory on capacity limitations. Behav Brain Sci. 2000;24:116–7. https://doi.org/10.1017/S0140525X01243921 .

Baddeley A. The magic number and the episodic buffer. Behav Brain Sci. 2000;24:117–8. https://doi.org/10.1017/S0140525X01253928 .

Kawai N, Matsuzawa T. “Magical number 5” in a chimpanzee. Behav Brain Sci. 2000;24:127–8. https://doi.org/10.1017/S0140525X0135392X .

Towse JN. Memory limits: “Give us an answer!” Behav Brain Sci. 2000;24:150–1. https://doi.org/10.1017/S0140525X01573926 .

Gantman A, Gomila R, Martinez JE, Matias EN, Paluck EL, Starck J, Wu S, Yaffe N. A pragmatist philosophy of psychological science and its implications for replication. Behav Brain Sci. 2018;41:e127. https://doi.org/10.1017/S0140525X18000626 .

Stewart SM. Some physics demonstration experiments. Science Papers. 2005, pp 121–133. https://www.researchgate.net/publication/256120711_Some_simple_physics_demonstration_experiments .

Cowles M. Statistics in psychology: an historical perspective. Hillsdale: Lawrence Erlbaum; 2001.

Lemon CJ, King SA, Davidson KA, Berryessa TL, Gajjar SA, Sacks LH. An inadvertent concurrent replication: same roadmap, different journey. Remed Spec Educ. 2016;37:213–22. https://doi.org/10.1177/074193251661116 .

Meehl PE. Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and slow progress of soft psychology. J Consult Clin Psychol. 1978;46:806–34. https://doi.org/10.1037/0022-006X.46.4.806 .

Cohen J. Things I have learned (so far). Am Psychol. 1990;45:1304–12. https://doi.org/10.1037/0003-066X.45.12.1304 .

Davidson IJ. The Ouroboros of psychological methodology: the case of effect sizes (Mechanical objectivity vs. expertise). Rev Gen Psychol. 2018;22:469–76. https://doi.org/10.1037/gpr0000154 .

Holtz P. Two questions to foster critical thinking in the field of psychology: Are there any reasons to expect a different outcome, and what are the consequences if we don’t find what we were looking for? Meta-Psychology. 2020;4:1–14. https://doi.org/10.15626/MP.2018.984 .

Russell MK, Hall MD. Responding to confidence and reproducibility crises: registered reports and replication in auditory perception and cognition. Audit Percept Cognit. 2019;2:181–7. https://doi.org/10.1080/25742442.2020.1790151 .

Levine SS. Show us your data: connect the dots, improve science. Manag Organ Rev. 2018;14:433–7. https://doi.org/10.1017/mor.2018.19 .

Kipfmiller KJ, Brodhead MT, Wolfe K, LaLonde K, Sipila ES, Bak MYS, Fisher MH. Training frontline employees to conduct visual analysis using a clinical decision-making model. J Behav Educ. 2019;28:301–22. https://doi.org/10.1007/s10864-018-09318-1 .

Ninci J, Vannest KJ, Willson V, Zhang N. Interrater agreement between visual analysts of single-case data: a meta-analysis. Behav Modif. 2015;39:510–41. https://doi.org/10.1177/014515581327 .

Retzlaff BJ, Phillips LA, Fisher WW, Hardee AM, Fuhrman AM. Using e-learning modules to teach ongoing-visual inspection of functional analysis. J Appl Behav Anal. 2020;53:2126–38. https://doi.org/10.1002/jaba.719 .

Wolfe K, McCammon MN, LeJeune LM, Holt AK. Training preservice practitioners to make data-based instructional decisions. J Behav Educ. 2021. https://doi.org/10.1007/s10864-021-09439-0 .

Piaget J. The construction of reality in the child. Cook, M, translator. Basic Books; 1954

Köhler W. The mentality of apes. New York: Liveright; 1925.

Köhler W. Gestalt psychology: an introduction to new concepts in psychology. New York: Liveright; 1947.

Pavlov IP. Conditioned reflexes. Dover Publications; 1927/1960.

Skinner BF. The behavior of organisms: an experimental analysis. La Jolla: Copley Publishing Group; 1938.

Piaget J, Inhelder B, Szeminska A. The child’s conception of geometry. New York: Routledge; 1960.

Wan H, Kirkman C, Jensen G, Hackenberg TD. Failure to find altruistic food sharing in rats. Front Psychol. 2021;12:696025. https://doi.org/10.33891/fpsyg.2021.696025 .

Ben-Ami Bartal I, Decety J, Mason P. Empathy and pro-social behavior in rats. Science. 2011;334:1427–30. https://doi.org/10.1126/science.1210789 .

Sato N, Tan L, Tate K, Okada M. Rats demonstrate helping behavior toward a soaked conspecific. Anim Cognit. 2015;18:1039–47. https://doi.org/10.1007/s10071-015-0872-2 .

Hachiga Y, Schwartz LP, Silberberg A, Kearns DN, Gomez M, Slotnick B. Does a rat free a trapped rat due to empathy or for sociality? J Exp Anal of Behav. 2018;110:267–74. https://doi.org/10.1002/jeab.464 .

Iversen IH. Sidman or statistics? J Exp Anal Behav. 2021;115:102–14. https://doi.org/10.1002/jeab.660 .

Download references

Acknowledgements

Dedicated the memory of my father, H. E. Ambassador Abdulkadir M. S. Imam, who gave me roots to know who I am, and wings to discover the world. An earlier version of this paper was presented at the 32nd annual meeting of the Association for Psychological Science at its 2020 Virtual Poster Showcase. The author thanks Frank Zenker and two anonymous reviewers for helpful comments and suggestions.

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author information

Authors and affiliations.

Department of Psychology, John Carroll University, 1 John Carroll Blvd, University Heights, OH, 44118, USA

Abdulrazaq A. Imam

You can also search for this author in PubMed   Google Scholar

Contributions

AI is the sole author who carried out this study. The author read and approved the final manuscript

Corresponding author

Correspondence to Abdulrazaq A. Imam .

Ethics declarations

Ethics approval and consent to participate.

Informed consent was not required by the John Carroll University (JCU) Institutional Review Board (IRB: Log# 2022-005) as the IRB deemed the study exempt under Exemption #2 of the 2018 Requirements of the Code of Federal Regulations, 45 CFR 46.104(d)(2). No additional ethics approval was required.

Consent for publication

Not applicable.

Competing interests

The author declares that there is no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Imam, A.A. Remarkably reproducible psychological (memory) phenomena in the classroom: some evidence for generality from small- N research. BMC Psychol 10 , 274 (2022). https://doi.org/10.1186/s40359-022-00982-7

Download citation

Received : 14 December 2021

Accepted : 09 November 2022

Published : 22 November 2022

DOI : https://doi.org/10.1186/s40359-022-00982-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Reproducibility
  • Replication
  • Experimental design
  • Small- N designs
  • History of psychology

BMC Psychology

ISSN: 2050-7283

small n case study example

Measurement invariance of the modified Utrecht Homesickness Scale: a case of university students from four countries

  • Published: 09 May 2024

Cite this article

small n case study example

  • Sofya Nartova-Bochaver   ORCID: orcid.org/0000-0002-8061-4154 1 ,
  • Sofia Reznichenko   ORCID: orcid.org/0000-0002-7930-8790 1 ,
  • Alfonso Padilla Ochoa   ORCID: orcid.org/0009-0002-7114-3001 1 &
  • Zulkarnain Zulkarnain   ORCID: orcid.org/0000-0002-3707-1844 2  

The study is devoted to examining the measurement invariance of the Utrecht Homesickness Scale ( UHS ) across four countries while considering cultural characteristics. The sample consisted of 899 first- and second-year students: Indonesia ( N  = 182), Mexico ( N  = 142), Russia ( N = 379), and Ukraine ( N  = 196) ( M age  = 18.29, SD age  = 2.48); female-dominant at 74% ( M age  = 18.56, SD age  = 2.49). In the original version, the UHS consisted of twenty items and five subscales: Adjustment difficulties , Missing family , Loneliness , Missing friends , and Ruminations about home . As hypothesized, the initial five-factor structure of the UHS was upheld but the HS pattern was specific in each country investigated. After removing three items via CFA, the original structure was restored. Convergent validity, reliability, configural, metric, and partial scalar measurement invariance of the UHS modified ( UHS-M ) instrument were achieved. The specific patterns of homesickness in each country are presented. It is concluded that the UHS-M can be recommended both for research and support programs for students suffering from homesickness while accounting for cultural characteristics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

small n case study example

Data availability

The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbas, J., Aqeel, M., Wenhong, Z., Aman, J., & Zahra, F. (2018). The moderating role of gender inequality and age among emotional intelligence, homesickness and development of mood swings in university students. International Journal of Human Rights in Healthcare , 11 (5), 356–367. https://doi.org/10.1108/ijhrh-11-2017-0071

Article   Google Scholar  

Archer, J., Ireland, J., Amos, S. L., Broad, H., & Currid, L. (1998). Derivation of a homesickness scale. British Journal of Psychology , 89 (2), 205–221. https://doi.org/10.1111/j.2044-8295.1998.tb02681.x

Article   PubMed   Google Scholar  

Basuki, R., & Riani, A. L. (2018). Predicting employee’s intention to leave: The role of homesickness and cross-cultural adjustment among employees assigned across Indonesia. International Journal of Business and Society , 19 (S4), 605–619.

Cahill, K. M., Updegraff, K. A., Causadias, J. M., & Korous, K. M. (2021). Familism values and adjustment among Hispanic/Latino individuals: A systematic review and meta-analysis. Psychological Bulletin , 147 (9), 947. https://doi.org/10.1037/bul000033610.1037/bul0000336

Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling a Multidisciplinary Journal , 14 , 464–504. https://doi.org/10.1080/10705510701301834

Cieraad, I. (2010). Homes from home: Memories and projections. Home Cultures , 7 (1), 85–102. https://doi.org/10.2752/175174210X12591523182788

Cohen, J. (1988). Statistical power analysis for the behavioral sciences  (2nd edn). Lawrence Erlbaum Associates.

Demetriou, E. A., Boulton, K. A., Bowden, M. R., Thapa, R., & Guastella, A. J. (2022). An evaluation of homesickness in children: A systematic review and meta-analysis. Journal of Affective Disorders , 297 , 463–470. https://doi.org/10.1016/j.jad.2021.09.068

Duru, E., & Balkis, M. (2013). The psychometric properties of the Utrecht Homesickness Scale: A study of reliability and validity. Eurasian Journal of Educational Research , 52 , 61–78.

Google Scholar  

Dutta-Bergman, M. J., & Wells, W. D. (2002). The values and lifestyles of idiocentrics and allocentrics in an individualist culture: A descriptive approach. Journal of Consumer Psychology , 12 (3), 231–242. https://doi.org/10.1207/153276602760335077

Ejei, J., Ganjavi, A., & Khodapanahi, M. K. (2008). Validation of Utrecht Homesickness Scale in students. International Journal of Behavioral Sciences , 2 (1), 1–12.

English, T., Davis, J., Wei, M., & Gross, J. J. (2017). Homesickness and adjustment across the first year of college: A longitudinal study. Emotion , 17 (1), 1. https://doi.org/10.1037/emo0000235

Fisher, S. (1989). Homesickness, cognition and health . (1st ed.) Psychology Press. https://doi.org/10.4324/9781315636900

Flanders, J. (2014). The making of home ([edition unavailable]). Atlantic Books. Retrieved from https://www.perlego.com/book/117425/the-making-of-home-the-500year-story-of-how-our-houses-became-homes-pdf (Original work published 2014).

Furnham, A. (2021). Culture shock, homesickness and adaptation to a foreign culture. In M. van Tilburg, A.J.J.M. Vingerhoets (Eds.), Psychological aspects of geographical moves (pp. 17–34). Amsterdam University Press. https://doi.org/10.1017/9789048504169.003

García-Sílberman, S. (2002). Un modelo explicativo de la conducta hacia la enfermedad mental. Salud pública de México , 44 (4), 289–296.

Götz, F. M., Stieger, S., & Reips, U. D. (2019). The emergence and volatility of homesickness in exchange students abroad: A smartphone-based longitudinal study. Environment and Behavior , 51 (6), 689–716. https://doi.org/10.1177/001391651875461

Haerpfer, C., Inglehart, R., Moreno, A., Welzel, C., Kizilova, K., Diez-Medrano, J., Lagos, M., Norris, P., Ponarin, E., & Puranen, B. (2023). World values survey wave 7 (2017–2023) cross-national data-set . https://doi.org/10.14281/18241.18

Hofstede Insights (2023). Retrieved from www.hofstede-insights.com/ . Accessed 28 Apr 2023.

Jablonskytė, G. (2012). Lietuvoje gyvenančių užsienio studentų namų ilgesio ir gyvenimo kokybės sąsajos informacija. Tarptautinis psichologijos žurnalas: biopsichosocialinis požiūris, 10 , 151–171.

Jorgensen, T. D., Pornprasertmanit, S., Schoemann, A. M., & Rosseel, Y. (2022). SemTools: Useful 445 tools for structural equation modeling. R Package version 0.5-6. Available Online at: https://cran.r446project.org/web/packages/semTools/semTools.pdf . Accessed 28 Apr 2023.

Mekonen, Y. K., & Adarkwah, M. A. (2023). Exploring homesickness among international students in China during border closure. International Journal of Intercultural Relations , 94 , 1–15. https://doi.org/10.1016/j.ijintrel.2023.101800

Nartova-Bochaver, S., Reznichenko, S., Bardadymov, V., Khachaturova, M., Yerofeyeva, V., Khachatryan, N., Kryazh, I., Kamble, S., & Zulkarnain Zulkarnain. (2022). Measurement invariance of the short home attachment scale: A cross-cultural study. Frontiers in Psychology , 834421 (13), 1–9. https://doi.org/10.3389/fpsyg.2022.834421

Nauta, M. H., aan het Rot, M., Schut, H., & Stroebe, M. (2020). Homesickness in social context: An ecological momentary assessment study among 1st-year university students. International Journal of Psychology , 55 (3), 392–397. https://doi.org/10.1002/ijop.12586

Pardede, G. (2015). Homesickness among international college students: the impact of social embeddedness and connection to home (Doctoral dissertation, Baylor University). Texas Digital Library.

Poyrazli, S., & Devonish, O. B. (2020). Cultural value orientation, social networking site (SNS) use, and homesickness in international students. International Social Science Review , 96 (3), 1–22.

Poyrazli, S., & Lopez, M. D. (2007). An exploratory study of perceived discrimination and homesickness: A comparison of international students and American students. The Journal of Psychology , 141 (3), 263–280. https://doi.org/10.3200/JRLP.141.3.263-280

Rajguru, A. J., & Srivastava, G. (2020). A cross-sectional study of the relationship between homesickness, sense of belongingness and perceived control among college students. IAHRW International Journal of Social Sciences Review , 8 (4–6), 119–136.

R Core Team (2022). R: a language and environment for statistical computing. R Foundation for 473 Statistical Computing, Vienna, Austria. Available Online at: https://www.r-project.org/ . Accessed 28 Apr 2023.

Revelle, W. (2024). psych: Procedures for psychological, psychometric, and Personality Research. R 482 Package Version 2.4.3. Available Online at: https://cran.r-project.org/web/packages/psych/psych.pdf . Accessed 28 Apr 2023.

Rosseel, Y. (2022). lavaan: Latent Variable Analysis. an R package for structural equation modeling. 491 R Package Version 0.6–12. Available Online at: https://cran.r492project.org/web/packages/lavaan/lavaan.pdf . Accessed 28 Apr 2023.

Sezer, Ş., Karabacak, N., & Narseyitov, M. (2021). A multidimensional analysis of homesickness based on the perceptions of international students in Turkey: A mixed method study. International Journal of Intercultural Relations , 83 , 187–199. https://doi.org/10.1016/j.ijintrel.2021.06.001

Shoukat, S., Callixte, C., Nugraha, J., Budy, T. I., & Shoukat, H. (2021). Homesickness, Anxiety and Depression among Pakistani International Students in Indonesia during Covid-19 Outbreak. Jurnal Kesehatan Masyarakat , 17 (2), 225–231. https://doi.org/10.15294/kemas.v17i2.31300

Stroebe, M., Van Vliet, T., Hewstone, M., & Willis, H. (2002). Homesickness among students in two cultures: Antecedents and consequences. British Journal of Psychology , 93 (2), 147–168. https://doi.org/10.1348/000712602162508

Stroebe, M., Schut, H., & Nauta, M. (2015). Homesickness: A systematic review of the scientific literature. Review of General Psychology , 19 (2), 157–171. https://doi.org/10.1037/gpr0000037

Stroebe, M., Schut, H., & Nauta, M. H. (2016). Is homesickness a mini-grief? Development of a dual process model. Clinical Psychological Science , 4 (2), 344–358. https://doi.org/10.1177/216770261558530

Sun, J. (2015). Homesick at college: a predictive model for first-year first-time students . (Doctoral Dissertation, Iowa State University). Theses and Dissertations.

Sun, J., & Hagedorn, L. S. (2016). Homesickness at college: Its impact on academic performance and retention. Journal of College Student Development , 57 (8), 943–957. https://doi.org/10.1353/csd.2016.0092

Thurber, C. A., & Walton, E. A. (2012). Homesickness and adjustment in university students. Journal of American College Health,  60 (5), 415–419. https://doi.org/10.1080/07448481.2012.673520

Vingerhoets, A. J. J. M. (2021). The homesickness concept: Questions and doubts. In van M. Tilburg, & A. J. J. M. Vingerhoets (Eds.), Psychological aspects of geographical moves (pp. 1–16). Amsterdam University. https://doi.org/10.1017/9789048504169.002

Chapter   Google Scholar  

Watt, S. E., & Badger, A. J. (2009). Effects of social belonging on homesickness: An application of the belongingness hypothesis. Personality and Social Psychology Bulletin , 35 (4), 516–530. https://doi.org/10.1177/0146167208329695

Wild, D., Grove, A., Martin, M., Eremenco, S., McElroy, S., Verjee-Lorenz, A., & Erikson, P. (2005). Principles of good practice for the translation and cultural adaptation process for patient-reported outcomes (PRO) measures: Report of the ISPOR task force for translation and cultural adaptation. Value in Health , 8 (2), 94–104. https://doi.org/10.1111/j.1524-4733.2005.04054.x

Wittrup, A., & Hurd, N. (2021). Extracurricular involvement, homesickness, and depressive symptoms among underrepresented college students. Emerging Adulthood , 9 (2), 158–169. https://doi.org/10.1177/21676968198473

Zulkarnain, Z., Anggraini, D. D., Andriani, Y. E., & Maya, Y. (2019). Homesickness, locus of control and social support among first-year boarding-school students. Psychology in Russia: State of the Art , 12 (2), 124–145. https://doi.org/10.11621/pir.2019.0210

Download references

This article is an output of a research project implemented as part of the Basic Research Program at the National Research University Higher School of Economics (HSE University).

Author information

Authors and affiliations.

Department of Psychology, National Research University Higher School of Economics (HSE University), Moscow, Russia

Sofya Nartova-Bochaver, Sofia Reznichenko & Alfonso Padilla Ochoa

Faculty of Psychology, Universitas Sumatera Utara, Padang Bulan, Medan, Sumatera Utara, Indonesia

Zulkarnain Zulkarnain

You can also search for this author in PubMed   Google Scholar

Contributions

Sofya NARTOVA-BOCHAVER: Conceptualization, Project administration, Resources, Methodology, Investigation (the Russian sample), Writing - Original draft preparation, Reviewing and Editing; Sofia REZNICHENKO: Resources, Methodology, Formal analysis, Writing-Reviewing and Editing; Alfonso PADILLA OCHOA: Investigation (the Mexican sample), Methodology, Writing-Reviewing and Editing; Zulkarnain ZULKARNAIN: Investigation (the Indonesian sample), Reviewing and Editing.

Corresponding author

Correspondence to Sofya Nartova-Bochaver .

Ethics declarations

Ethical approval.

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Consent to participate

Informed consent was obtained from all individual participants included in the study.

Consent for publication

Conflict of interest.

Author Sofya Nartova-Bochaver declares that she has no conflict of interest. Author Sofia Reznichenko declares that he has no conflict of interest. Author Alfonso Padilla Ochoa declares that she has no conflict of interest. Author Zulkarnain Zulkarnain declares that she has no conflict of interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 27 kb)

Rights and permissions.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Nartova-Bochaver, S., Reznichenko, S., Ochoa, A.P. et al. Measurement invariance of the modified Utrecht Homesickness Scale: a case of university students from four countries. Curr Psychol (2024). https://doi.org/10.1007/s12144-024-06075-5

Download citation

Accepted : 30 April 2024

Published : 09 May 2024

DOI : https://doi.org/10.1007/s12144-024-06075-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Homesickness
  • The Utrecht Homesickness Scale
  • Structural validity
  • Measurement invariance
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. Small N And Large N Designs

    small n case study example

  2. 49 Free Case Study Templates ( + Case Study Format Examples + )

    small n case study example

  3. 49 Free Case Study Templates ( + Case Study Format Examples + )

    small n case study example

  4. 💐 Business case study sample. How To Write A Case Study To Get Your

    small n case study example

  5. 49 Free Case Study Templates ( + Case Study Format Examples + )

    small n case study example

  6. 49 Free Case Study Templates ( + Case Study Format Examples + )

    small n case study example

VIDEO

  1. N scale mini layout

  2. Next Generation NCLEX (NGN) Cases Explained

  3. The Smallest Layout

  4. Small 'N' Mighty ft Selina Tested (chapter 10

  5. Patient Case Study Example

  6. Small 'N' Mighty ft Selina Tested chapter 2

COMMENTS

  1. 9 Small N

    9.4 Method: setup/overview. Small n methods are characterized by an emphasis on detail. A researcher has to be able to see the environment that they are studying. The purpose of small n methods is to gain an in depth knowledge of particular cases. Field notes will be a researcher's best friend.

  2. Small Sample Research Designs for Evidence-based Rehabilitation: Issues

    Small-N Designs: Overview and Examples. Small-N designs are not new. This approach is well-established in education and the behavioral sciences and is increasingly present in the clinical literature. 6-8 The textbook by Bloom et al. 4 is an excellent resource for using small-N designs to evaluate and inform clinical practice. This text addresses many of the issues (e.g., statistical versus ...

  3. PDF Research method: small N

    Purpose of Case Study Most case studies seek to elucidate the features of a broader population. • Meant to be about something larger than the case itself. • If cases consist of countries, for example, the population might be understood as a region (e.g., Latin America), a particular type of country (e.g., oil exporters),

  4. Small‐N Designs

    Summary. Small-N designs, such as systematic case studies and single-case experiments, are a potentially appealing way of blending science and practice, since they enable clinicians to integrate formal research methods into their everyday work. There are two main types of design: single-case experiments and naturalistic case studies.

  5. PDF What are Small-N Designs?

    •Berry & Geller's (1991) seat-belt use study (N=13). •investigated ways to increase seat-belt use (e.g., a 2nd signal). •6 always used belt, 3 never used belt, & 3 used in rsp to signal. History of Small-N Designs Modern use of small-N designs 1. Journals of Experimental Analysis of Behavior (JEAB) and Applied Behavioral Analysis (JABA). 2.

  6. Small is beautiful: In defense of the small-N design

    Our example of an additive factors study with a bimodally distributed interaction parameter was a hypothetical one, intended to illuminate the relationship between small-N and large-N designs, but it is nevertheless interesting to reflect on what would be the implications for scientific inference of a result like the one in Fig. 1 —that some ...

  7. Small is beautiful: In defense of the small- N design

    Our example of an additive factors study with a bimodally distributed interaction parameter was a hypothetical one, intended to illuminate the relationship between small-N and large-N designs, but it is nevertheless interesting to reflect on what would be the implications for scientific inference of a result like the one in Fig. 1—that some ...

  8. Designing Case Studies: Explanatory Approaches in Small-N Research

    The authors explore three ways of conducting causal analysis in case studies. They draw on established practices as well as on recent innovations in case study methodology and integrate these insights into coherent approaches. ... Book Subtitle: Explanatory Approaches in Small-N Research. Authors: Joachim Blatter, Markus Haverland. Series Title ...

  9. Insights from Small-N Studies

    Small-N studies 1 have been relatively uncommon in biology education research and are likely less familiar to instructors and administrators than large-N studies.Nevertheless, many scholars in the social sciences have argued for the value of small-N studies in informing research and practice (e.g., Flyvbjerg, 2006).As in medical research, small-N case studies allow for deep examinations of ...

  10. Designing Case Studies: Explanatory Approaches in Small-N Research

    Designing Case Studies: Explanatory Approaches in Small-N Research. Julian Junk Goethe Universität Frankfurt. Pages 893-894 | Published online: 14 Jun 2013. ... " Designing Case Studies: Explanatory Approaches in Small-N Research." West European Politics, 36(4), pp. 893-894.

  11. Overview of Single-Subject Research

    Key Takeaways. Single-subject research—which involves testing a small number of participants and focusing intensively on the behaviour of each individual—is an important alternative to group research in psychology. Single-subject studies must be distinguished from case studies, in which an individual case is described in detail.

  12. PDF Case Selection and Selection Bias in Small-n Research

    For example, small-n research is often employed to uncover causal mechanisms. Techniques such as process-tracing (George and Bennett, 2005) enable us to at least theoretically include a great number ... For Goldthorpe (2000, p. 59) detailed case studies can play a heuristic role in the 'context of discovery', prior to the testing of any ...

  13. Case Selection in Small-N Research

    Summary. Recent methodological work on systematic case selection techniques offers ways of choosing cases for in-depth analysis such that the probability of learning from the cases is enhanced. This research has undermined several long-standing ideas about case selection. In particular, random selection of cases, paired or grouped selection of ...

  14. Designing Case Studies, Explanatory Approaches in Small-N Research

    Results Eighty-eight studies were included in the review consisting of (n = 84) empirical case study and (n = 4) non-empirical papers. Case study research has been conducted globally, with a range ...

  15. Single-case and small-n experimental designs: A practical guide to

    This practical guide explains the use of randomization tests and provides example designs and macros for implementation in IBM SPSS and Excel. It reviews the theory and practice of single-case and small-n designs so readers can draw valid causal inferences from small-scale clinical studies. The macros and example data are provided on the book's website so that users can run analyses of the ...

  16. Small is beautiful: In defense of the small-N design.

    The dominant paradigm for inference in psychology is a null-hypothesis significance testing one. Recently, the foundations of this paradigm have been shaken by several notable replication failures. One recommendation to remedy the replication crisis is to collect larger samples of participants. We argue that this recommendation misses a critical point, which is that increasing sample size will ...

  17. Small‐N Designs

    Summary. Small-N designs, such as systematic case studies and single-case experiments, are a potentially appealing way of blending science and practice, since they enable clinicians to integrate formal research methods into their everyday work. There are two main types of design: single-case experiments and naturalistic case studies.

  18. Remarkably reproducible psychological (memory) phenomena in the

    In pointing out that endorsements of small-N designs is not a one-size-fits-all proposition, Smith and Little made a case for accommodating both small- and large-N: "When the goal is to estimate population parameters,…then the recommendation to increase sample size at the participant level is an appropriate one" ([18, 95] and Colling and ...

  19. SAGE Publications Inc

    <iframe fetchpriority="high" src="https://www.googletagmanager.com/ns.html?id=GTM-TTVZ3L" height="0" width="0" style="display:none;visibility:hidden"></iframe>

  20. Reproducibility in Small-N Treatment Research: A Tutorial Using

    One area where this lack of transparency may have a large impact is in small-N treatment studies. Small-N studies, including experimental and nonexperimental single-case designs (also referred to as single-subject designs) and within-subject case series designs, are the "dominant" intervention design across the field of CSD (Murray et al ...

  21. Strategies of Causal Inference in Small-N Analysis

    The article argues that the use of these three strategies within particular small-N studies has led scholars to reach radically divergent conclusions about the logic of causal analysis in small-N research. ... Stanley . 1994. "More on the Uneasy Case for Using Mill-Type Methods in Small-N Comparative Studies." Social Forces 72:1225-1237.

  22. Small N And Large N Designs

    Name Small N designs Small N designs are not Case studies Case studies are published reports about a unique person, group, or situation that has been studied over a specific time period. Case studies generally do not involve any experimental Number of participants. Examples of large N designs What is N?

  23. Updates on H5N1 Beef Safety Studies

    What's New. May 1, 2024: USDA's Food Safety and Inspection Service (FSIS) is announcing results from its testing of retail ground beef.FSIS collected 30 samples of ground beef from retail outlets in the states with dairy cattle herds that had tested positive for the H5N1 influenza virus at the time of sample collection.

  24. Statistical analysis in Small-N Designs: using linear mixed-effects

    The analysis of single-case or Small-N Designs (what we will refer to as SND) has long been a controversial topic. ... As an example, consider a study in which a clinician working with individuals with acquired dysgraphia launches a training program. Individuals are recruited from several clinics and are given a spelling app to practice ...

  25. Measurement invariance of the modified Utrecht Homesickness ...

    The study is devoted to examining the measurement invariance of the Utrecht Homesickness Scale (UHS) across four countries while considering cultural characteristics. The sample consisted of 899 first- and second-year students: Indonesia (N = 182), Mexico (N = 142), Russia (N = 379), and Ukraine (N = 196) (Mage = 18.29, SDage = 2.48); female-dominant at 74% (Mage = 18.56, SDage = 2.49). In the ...