U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Single-Case Experimental Designs: A Systematic Review of Published Research and Current Standards

Justin d. smith.

Child and Family Center, University of Oregon

This article systematically reviews the research design and methodological characteristics of single-case experimental design (SCED) research published in peer-reviewed journals between 2000 and 2010. SCEDs provide researchers with a flexible and viable alternative to group designs with large sample sizes. However, methodological challenges have precluded widespread implementation and acceptance of the SCED as a viable complementary methodology to the predominant group design. This article includes a description of the research design, measurement, and analysis domains distinctive to the SCED; a discussion of the results within the framework of contemporary standards and guidelines in the field; and a presentation of updated benchmarks for key characteristics (e.g., baseline sampling, method of analysis), and overall, it provides researchers and reviewers with a resource for conducting and evaluating SCED research. The results of the systematic review of 409 studies suggest that recently published SCED research is largely in accordance with contemporary criteria for experimental quality. Analytic method emerged as an area of discord. Comparison of the findings of this review with historical estimates of the use of statistical analysis indicates an upward trend, but visual analysis remains the most common analytic method and also garners the most support amongst those entities providing SCED standards. Although consensus exists along key dimensions of single-case research design and researchers appear to be practicing within these parameters, there remains a need for further evaluation of assessment and sampling techniques and data analytic methods.

The single-case experiment has a storied history in psychology dating back to the field’s founders: Fechner (1889) , Watson (1925) , and Skinner (1938) . It has been used to inform and develop theory, examine interpersonal processes, study the behavior of organisms, establish the effectiveness of psychological interventions, and address a host of other research questions (for a review, see Morgan & Morgan, 2001 ). In recent years the single-case experimental design (SCED) has been represented in the literature more often than in past decades, as is evidenced by recent reviews ( Hammond & Gast, 2010 ; Shadish & Sullivan, 2011 ), but it still languishes behind the more prominent group design in nearly all subfields of psychology. Group designs are often professed to be superior because they minimize, although do not necessarily eliminate, the major internal validity threats to drawing scientifically valid inferences from the results ( Shadish, Cook, & Campbell, 2002 ). SCEDs provide a rigorous, methodologically sound alternative method of evaluation (e.g., Barlow, Nock, & Hersen, 2008 ; Horner et al., 2005 ; Kazdin, 2010 ; Kratochwill & Levin, 2010 ; Shadish et al., 2002 ) but are often overlooked as a true experimental methodology capable of eliciting legitimate inferences (e.g., Barlow et al., 2008 ; Kazdin, 2010 ). Despite a shift in the zeitgeist from single-case experiments to group designs more than a half century ago, recent and rapid methodological advancements suggest that SCEDs are poised for resurgence.

Single case refers to the participant or cluster of participants (e.g., a classroom, hospital, or neighborhood) under investigation. In contrast to an experimental group design in which one group is compared with another, participants in a single-subject experiment research provide their own control data for the purpose of comparison in a within-subject rather than a between-subjects design. SCEDs typically involve a comparison between two experimental time periods, known as phases. This approach typically includes collecting a representative baseline phase to serve as a comparison with subsequent phases. In studies examining single subjects that are actually groups (i.e., classroom, school), there are additional threats to internal validity of the results, as noted by Kratochwill and Levin (2010) , which include setting or site effects.

The central goal of the SCED is to determine whether a causal or functional relationship exists between a researcher-manipulated independent variable (IV) and a meaningful change in the dependent variable (DV). SCEDs generally involve repeated, systematic assessment of one or more IVs and DVs over time. The DV is measured repeatedly across and within all conditions or phases of the IV. Experimental control in SCEDs includes replication of the effect either within or between participants ( Horner et al., 2005 ). Randomization is another way in which threats to internal validity can be experimentally controlled. Kratochwill and Levin (2010) recently provided multiple suggestions for adding a randomization component to SCEDs to improve the methodological rigor and internal validity of the findings.

Examination of the effectiveness of interventions is perhaps the area in which SCEDs are most well represented ( Morgan & Morgan, 2001 ). Researchers in behavioral medicine and in clinical, health, educational, school, sport, rehabilitation, and counseling psychology often use SCEDs because they are particularly well suited to examining the processes and outcomes of psychological and behavioral interventions (e.g., Borckardt et al., 2008 ; Kazdin, 2010 ; Robey, Schultz, Crawford, & Sinner, 1999 ). Skepticism about the clinical utility of the randomized controlled trial (e.g., Jacobsen & Christensen, 1996 ; Wachtel, 2010 ; Westen & Bradley, 2005 ; Westen, Novotny, & Thompson-Brenner, 2004 ) has renewed researchers’ interest in SCEDs as a means to assess intervention outcomes (e.g., Borckardt et al., 2008 ; Dattilio, Edwards, & Fishman, 2010 ; Horner et al., 2005 ; Kratochwill, 2007 ; Kratochwill & Levin, 2010 ). Although SCEDs are relatively well represented in the intervention literature, it is by no means their sole home: Examples appear in nearly every subfield of psychology (e.g., Bolger, Davis, & Rafaeli, 2003 ; Piasecki, Hufford, Solham, & Trull, 2007 ; Reis & Gable, 2000 ; Shiffman, Stone, & Hufford, 2008 ; Soliday, Moore, & Lande, 2002 ). Aside from the current preference for group-based research designs, several methodological challenges have repressed the proliferation of the SCED.

Methodological Complexity

SCEDs undeniably present researchers with a complex array of methodological and research design challenges, such as establishing a representative baseline, managing the nonindependence of sequential observations (i.e., autocorrelation, serial dependence), interpreting single-subject effect sizes, analyzing the short data streams seen in many applications, and appropriately addressing the matter of missing observations. In the field of intervention research for example, Hser et al. (2001) noted that studies using SCEDs are “rare” because of the minimum number of observations that are necessary (e.g., 3–5 data points in each phase) and the complexity of available data analysis approaches. Advances in longitudinal person-based trajectory analysis (e.g., Nagin, 1999 ), structural equation modeling techniques (e.g., Lubke & Muthén, 2005 ), time-series forecasting (e.g., autoregressive integrated moving averages; Box & Jenkins, 1970 ), and statistical programs designed specifically for SCEDs (e.g., Simulation Modeling Analysis; Borckardt, 2006 ) have provided researchers with robust means of analysis, but they might not be feasible methods for the average psychological scientist.

Application of the SCED has also expanded. Today, researchers use variants of the SCED to examine complex psychological processes and the relationship between daily and momentary events in peoples’ lives and their psychological correlates. Research in nearly all subfields of psychology has begun to use daily diary and ecological momentary assessment (EMA) methods in the context of the SCED, opening the door to understanding increasingly complex psychological phenomena (see Bolger et al., 2003 ; Shiffman et al., 2008 ). In contrast to the carefully controlled laboratory experiment that dominated research in the first half of the twentieth century (e.g., Skinner, 1938 ; Watson, 1925 ), contemporary proponents advocate application of the SCED in naturalistic studies to increase the ecological validity of empirical findings (e.g., Bloom, Fisher, & Orme, 2003 ; Borckardt et al., 2008 ; Dattilio et al., 2010 ; Jacobsen & Christensen, 1996 ; Kazdin, 2008 ; Morgan & Morgan, 2001 ; Westen & Bradley, 2005 ; Westen et al., 2004 ). Recent advancements and expanded application of SCEDs indicate a need for updated design and reporting standards.

Many current benchmarks in the literature concerning key parameters of the SCED were established well before current advancements and innovations, such as the suggested minimum number of data points in the baseline phase(s), which remains a disputed area of SCED research (e.g., Center, Skiba, & Casey, 1986 ; Huitema, 1985 ; R. R. Jones, Vaught, & Weinrott, 1977 ; Sharpley, 1987 ). This article comprises (a) an examination of contemporary SCED methodological and reporting standards; (b) a systematic review of select design, measurement, and statistical characteristics of published SCED research during the past decade; and (c) a broad discussion of the critical aspects of this research to inform methodological improvements and study reporting standards. The reader will garner a fundamental understanding of what constitutes appropriate methodological soundness in single-case experimental research according to the established standards in the field, which can be used to guide the design of future studies, improve the presentation of publishable empirical findings, and inform the peer-review process. The discussion begins with the basic characteristics of the SCED, including an introduction to time-series, daily diary, and EMA strategies, and describes how current reporting and design standards apply to each of these areas of single-case research. Interweaved within this presentation are the results of a systematic review of SCED research published between 2000 and 2010 in peer-reviewed outlets and a discussion of the way in which these findings support, or differ from, existing design and reporting standards and published SCED benchmarks.

Review of Current SCED Guidelines and Reporting Standards

In contrast to experimental group comparison studies, which conform to generally well agreed upon methodological design and reporting guidelines, such as the CONSORT ( Moher, Schulz, Altman, & the CONSORT Group, 2001 ) and TREND ( Des Jarlais, Lyles, & Crepaz, 2004 ) statements for randomized and nonrandomized trials, respectively, there is comparatively much less consensus when it comes to the SCED. Until fairly recently, design and reporting guidelines for single-case experiments were almost entirely absent in the literature and were typically determined by the preferences of a research subspecialty or a particular journal’s editorial board. Factions still exist within the larger field of psychology, as can be seen in the collection of standards presented in this article, particularly in regard to data analytic methods of SCEDs, but fortunately there is budding agreement about certain design and measurement characteristics. A number of task forces, professional groups, and independent experts in the field have recently put forth guidelines; each has a relatively distinct purpose, which likely accounts for some of the discrepancies between them. In what is to be a central theme of this article, researchers are ultimately responsible for thoughtfully and synergistically combining research design, measurement, and analysis aspects of a study.

This review presents the more prominent, comprehensive, and recently established SCED standards. Six sources are discussed: (1) Single-Case Design Technical Documentation from the What Works Clearinghouse (WWC; Kratochwill et al., 2010 ); (2) the APA Division 12 Task Force on Psychological Interventions, with contributions from the Division 12 Task Force on Promotion and Dissemination of Psychological Procedures and the APA Task Force for Psychological Intervention Guidelines (DIV12; presented in Chambless & Hollon, 1998 ; Chambless & Ollendick, 2001 ), adopted and expanded by APA Division 53, the Society for Clinical Child and Adolescent Psychology ( Weisz & Hawley, 1998 , 1999 ); (3) the APA Division 16 Task Force on Evidence-Based Interventions in School Psychology (DIV16; Members of the Task Force on Evidence-Based Interventions in School Psychology. Chair: T. R. Kratochwill, 2003); (4) the National Reading Panel (NRP; National Institute of Child Health and Human Development, 2000 ); (5) the Single-Case Experimental Design Scale ( Tate et al., 2008 ); and (6) the reporting guidelines for EMA put forth by Stone & Shiffman (2002) . Although the specific purposes of each source differ somewhat, the overall aim is to provide researchers and reviewers with agreed-upon criteria to be used in the conduct and evaluation of SCED research. The standards provided by WWC, DIV12, DIV16, and the NRP represent the efforts of task forces. The Tate et al. scale was selected for inclusion in this review because it represents perhaps the only psychometrically validated tool for assessing the rigor of SCED methodology. Stone and Shiffman’s (2002) standards were intended specifically for EMA methods, but many of their criteria also apply to time-series, daily diary, and other repeated-measurement and sampling methods, making them pertinent to this article. The design, measurement, and analysis standards are presented in the later sections of this article and notable concurrences, discrepancies, strengths, and deficiencies are summarized.

Systematic Review Search Procedures and Selection Criteria

Search strategy.

A comprehensive search strategy of SCEDs was performed to identify studies published in peer-reviewed journals meeting a priori search and inclusion criteria. First, a computer-based PsycINFO search of articles published between 2000 and 2010 (search conducted in July 2011) was conducted that used the following primary key terms and phrases that appeared anywhere in the article (asterisks denote that any characters/letters can follow the last character of the search term): alternating treatment design, changing criterion design, experimental case*, multiple baseline design, replicated single-case design, simultaneous treatment design, time-series design. The search was limited to studies published in the English language and those appearing in peer-reviewed journals within the specified publication year range. Additional limiters of the type of article were also used in PsycINFO to increase specificity: The search was limited to include methodologies indexed as either quantitative study OR treatment outcome/randomized clinical trial and NOT field study OR interview OR focus group OR literature review OR systematic review OR mathematical model OR qualitative study.

Study selection

The author used a three-phase study selection, screening, and coding procedure to select the highest number of applicable studies. Phase 1 consisted of the initial systematic review conducted using PsycINFO, which resulted in 571 articles. In Phase 2, titles and abstracts were screened: Articles appearing to use a SCED were retained (451) for Phase 3, in which the author and a trained research assistant read each full-text article and entered the characteristics of interest into a database. At each phase of the screening process, studies that did not use a SCED or that either self-identified as, or were determined to be, quasi-experimental were dropped. Of the 571 original studies, 82 studies were determined to be quasi-experimental. The definition of a quasi-experimental design used in the screening procedure conforms to the descriptions provided by Kazdin (2010) and Shadish et al. (2002) regarding the necessary components of an experimental design. For example, reversal designs require a minimum of four phases (e.g., ABAB), and multiple baseline designs must demonstrate replication of the effect across at least three conditions (e.g., subjects, settings, behaviors). Sixteen studies were unavailable in full text in English, and five could not be obtained in full text and were thus dropped. The remaining articles that were not retained for review (59) were determined not to be SCED studies meeting our inclusion criteria, but had been identified in our PsycINFO search using the specified keyword and methodology terms. For this review, 409 studies were selected. The sources of the 409 reviewed studies are summarized in Table 1 . A complete bibliography of the 571 studies appearing in the initial search, with the included studies marked, is available online as an Appendix or from the author.

Journal Sources of Studies Included in the Systematic Review (N = 409)

Note: Each of the following journal titles contributed 1 study unless otherwise noted in parentheses: Augmentative and Alternative Communication; Acta Colombiana de Psicología; Acta Comportamentalia; Adapted Physical Activity Quarterly (2); Addiction Research and Theory; Advances in Speech Language Pathology; American Annals of the Deaf; American Journal of Education; American Journal of Occupational Therapy; American Journal of Speech-Language Pathology; The American Journal on Addictions; American Journal on Mental Retardation; Applied Ergonomics; Applied Psychophysiology and Biofeedback; Australian Journal of Guidance & Counseling; Australian Psychologist; Autism; The Behavior Analyst; The Behavior Analyst Today; Behavior Analysis in Practice (2); Behavior and Social Issues (2); Behaviour Change (2); Behavioural and Cognitive Psychotherapy; Behaviour Research and Therapy (3); Brain and Language (2); Brain Injury (2); Canadian Journal of Occupational Therapy (2); Canadian Journal of School Psychology; Career Development for Exceptional Individuals; Chinese Mental Health Journal; Clinical Linguistics and Phonetics; Clinical Psychology & Psychotherapy; Cognitive and Behavioral Practice; Cognitive Computation; Cognitive Therapy and Research; Communication Disorders Quarterly; Developmental Medicine & Child Neurology (2); Developmental Neurorehabilitation (2); Disability and Rehabilitation: An International, Multidisciplinary Journal (3); Disability and Rehabilitation: Assistive Technology; Down Syndrome: Research & Practice; Drug and Alcohol Dependence (2); Early Childhood Education Journal (2); Early Childhood Services: An Interdisciplinary Journal of Effectiveness; Educational Psychology (2); Education and Training in Autism and Developmental Disabilities; Electronic Journal of Research in Educational Psychology; Environment and Behavior (2); European Eating Disorders Review; European Journal of Sport Science; European Review of Applied Psychology; Exceptional Children; Exceptionality; Experimental and Clinical Psychopharmacology; Family & Community Health: The Journal of Health Promotion & Maintenance; Headache: The Journal of Head and Face Pain; International Journal of Behavioral Consultation and Therapy (2); International Journal of Disability; Development and Education (2); International Journal of Drug Policy; International Journal of Psychology; International Journal of Speech-Language Pathology; International Psychogeriatrics; Japanese Journal of Behavior Analysis (3); Japanese Journal of Special Education; Journal of Applied Research in Intellectual Disabilities (2); Journal of Applied Sport Psychology (3); Journal of Attention Disorders (2); Journal of Behavior Therapy and Experimental Psychiatry; Journal of Child Psychology and Psychiatry; Journal of Clinical Psychology in Medical Settings; Journal of Clinical Sport Psychology; Journal of Cognitive Psychotherapy; Journal of Consulting and Clinical Psychology (2); Journal of Deaf Studies and Deaf Education; Journal of Educational & Psychological Consultation (2); Journal of Evidence-Based Practices for Schools (2); Journal of the Experimental Analysis of Behavior (2); Journal of General Internal Medicine; Journal of Intellectual and Developmental Disabilities; Journal of Intellectual Disability Research (2); Journal of Medical Speech-Language Pathology; Journal of Neurology, Neurosurgery & Psychiatry; Journal of Paediatrics and Child Health; Journal of Prevention and Intervention in the Community; Journal of Safety Research; Journal of School Psychology (3); The Journal of Socio-Economics; The Journal of Special Education; Journal of Speech, Language, and Hearing Research (2); Journal of Sport Behavior; Journal of Substance Abuse Treatment; Journal of the International Neuropsychological Society; Journal of Traumatic Stress; The Journals of Gerontology: Series B: Psychological Sciences and Social Sciences; Language, Speech, and Hearing Services in Schools; Learning Disabilities Research & Practice (2); Learning Disability Quarterly (2); Music Therapy Perspectives; Neurorehabilitation and Neural Repair; Neuropsychological Rehabilitation (2); Pain; Physical Education and Sport Pedagogy (2); Preventive Medicine: An International Journal Devoted to Practice and Theory; Psychological Assessment; Psychological Medicine: A Journal of Research in Psychiatry and the Allied Sciences; The Psychological Record; Reading and Writing; Remedial and Special Education (3); Research and Practice for Persons with Severe Disabilities (2); Restorative Neurology and Neuroscience; School Psychology International; Seminars in Speech and Language; Sleep and Hypnosis; School Psychology Quarterly; Social Work in Health Care; The Sport Psychologist (3); Therapeutic Recreation Journal (2); The Volta Review; Work: Journal of Prevention, Assessment & Rehabilitation.

Coding criteria amplifications

A comprehensive description of the coding criteria for each category in this review is available from the author by request. The primary coding criteria are described here and in later sections of this article.

  • Research design was classified into one of the types discussed later in the section titled Predominant Single-Case Experimental Designs on the basis of the authors’ stated design type. Secondary research designs were then coded when applicable (i.e., mixed designs). Distinctions between primary and secondary research designs were made based on the authors’ description of their study. For example, if an author described the study as a “multiple baseline design with time-series measurement,” the primary research design would be coded as being multiple baseline, and time-series would be coded as the secondary research design.
  • Observer ratings were coded as present when observational coding procedures were described and/or the results of a test of interobserver agreement were reported.
  • Interrater reliability for observer ratings was coded as present in any case in which percent agreement, alpha, kappa, or another appropriate statistic was reported, regardless of the amount of the total data that were examined for agreement.
  • Daily diary, daily self-report, and EMA codes were given when authors explicitly described these procedures in the text by name. Coders did not infer the use of these measurement strategies.
  • The number of baseline observations was either taken directly from the figures provided in text or was simply counted in graphical displays of the data when this was determined to be a reliable approach. In some cases, it was not possible to reliably determine the number of baseline data points from the graphical display of data, in which case, the “unavailable” code was assigned. Similarly, the “unavailable” code was assigned when the number of observations was either unreported or ambiguous, or only a range was provided and thus no mean could be determined. Similarly, the mean number of baseline observations was calculated for each study prior to further descriptive statistical analyses because a number of studies reported means only.
  • The coding of the analytic method used in the reviewed studies is discussed later in the section titled Discussion of Review Results and Coding of Analytic Methods .

Results of the Systematic Review

Descriptive statistics of the design, measurement, and analysis characteristics of the reviewed studies are presented in Table 2 . The results and their implications are discussed in the relevant sections throughout the remainder of the article.

Descriptive Statistics of Reviewed SCED Characteristics

Note. % refers to the proportion of reviewed studies that satisfied criteria for this code: For example, the percent of studies reporting observer ratings.

Discussion of the Systematic Review Results in Context

The SCED is a very flexible methodology and has many variants. Those mentioned here are the building blocks from which other designs are then derived. For those readers interested in the nuances of each design, Barlow et al., (2008) ; Franklin, Allison, and Gorman (1997) ; Kazdin (2010) ; and Kratochwill and Levin (1992) , among others, provide cogent, in-depth discussions. Identifying the appropriate SCED depends upon many factors, including the specifics of the IV, the setting in which the study will be conducted, participant characteristics, the desired or hypothesized outcomes, and the research question(s). Similarly, the researcher’s selection of measurement and analysis techniques is determined by these factors.

Predominant Single-Case Experimental Designs

Alternating/simultaneous designs (6%; primary design of the studies reviewed).

Alternating and simultaneous designs involve an iterative manipulation of the IV(s) across different phases to show that changes in the DV vary systematically as a function of manipulating the IV(s). In these multielement designs, the researcher has the option to alternate the introduction of two or more IVs or present two or more IVs at the same time. In the alternating variation, the researcher is able to determine the relative impact of two different IVs on the DV, when all other conditions are held constant. Another variation of this design is to alternate IVs across various conditions that could be related to the DV (e.g., class period, interventionist). Similarly, the simultaneous design would occur when the IVs were presented at the same time within the same phase of the study.

Changing criterion design (4%)

Changing criterion designs are used to demonstrate a gradual change in the DV over the course of the phase involving the active manipulation of the IV. Criteria indicating that a change has occurred happen in a step-wise manner, in which the criterion shifts as the participant responds to the presence of the manipulated IV. The changing criterion design is particularly useful in applied intervention research for a number of reasons. The IV is continuous and never withdrawn, unlike the strategy used in a reversal design. This is particularly important in situations where removal of a psychological intervention would be either detrimental or dangerous to the participant, or would be otherwise unfeasible or unethical. The multiple baseline design also does not withdraw intervention, but it requires replicating the effects of the intervention across participants, settings, or situations. A changing criterion design can be accomplished with one participant in one setting without withholding or withdrawing treatment.

Multiple baseline/combined series design (69%)

The multiple baseline or combined series design can be used to test within-subject change across conditions and often involves multiple participants in a replication context. The multiple baseline design is quite simple in many ways, essentially consisting of a number of repeated, miniature AB experiments or variations thereof. Introduction of the IV is staggered temporally across multiple participants or across multiple within-subject conditions, which allows the researcher to demonstrate that changes in the DV reliably occur only when the IV is introduced, thus controlling for the effects of extraneous factors. Multiple baseline designs can be used both within and across units (i.e., persons or groups of persons). When the baseline phase of each subject begins simultaneously, it is called a concurrent multiple baseline design. In a nonconcurrent variation, baseline periods across subjects begin at different points in time. The multiple baseline design is useful in many settings in which withdrawal of the IV would not be appropriate or when introduction of the IV is hypothesized to result in permanent change that would not reverse when the IV is withdrawn. The major drawback of this design is that the IV must be initially withheld for a period of time to ensure different starting points across the different units in the baseline phase. Depending upon the nature of the research questions, withholding an IV, such as a treatment, could be potentially detrimental to participants.

Reversal designs (17%)

Reversal designs are also known as introduction and withdrawal and are denoted as ABAB designs in their simplest form. As the name suggests, the reversal design involves collecting a baseline measure of the DV (the first A phase), introducing the IV (the first B phase), removing the IV while continuing to assess the DV (the second A phase), and then reintroducing the IV (the second B phase). This pattern can be repeated as many times as is necessary to demonstrate an effect or otherwise address the research question. Reversal designs are useful when the manipulation is hypothesized to result in changes in the DV that are expected to reverse or discontinue when the manipulation is not present. Maintenance of an effect is often necessary to uphold the findings of reversal designs. The demonstration of an effect is evident in reversal designs when improvement occurs during the first manipulation phase, compared to the first baseline phase, then reverts to or approaches original baseline levels during the second baseline phase when the manipulation has been withdrawn, and then improves again when the manipulation in then reinstated. This pattern of reversal, when the manipulation is introduced and then withdrawn, is essential to attributing changes in the DV to the IV. However, maintenance of the effects in a reversal design, in which the DV is hypothesized to reverse when the IV is withdrawn, is not incompatible ( Kazdin, 2010 ). Maintenance is demonstrated by repeating introduction–withdrawal segments until improvement in the DV becomes permanent even when the IV is withdrawn. There is not always a need to demonstrate maintenance in all applications, nor is it always possible or desirable, but it is paramount in the learning and intervention research contexts.

Mixed designs (10%)

Mixed designs include a combination of more than one SCED (e.g., a reversal design embedded within a multiple baseline) or an SCED embedded within a group design (i.e., a randomized controlled trial comparing two groups of multiple baseline experiments). Mixed designs afford the researcher even greater flexibility in designing a study to address complex psychological hypotheses, but also capitalize on the strengths of the various designs. See Kazdin (2010) for a discussion of the variations and utility of mixed designs.

Related Nonexperimental Designs

Quasi-experimental designs.

In contrast to the designs previously described, all of which constitute “true experiments” ( Kazdin, 2010 ; Shadish et al., 2002 ), in quasi-experimental designs the conditions of a true experiment (e.g., active manipulation of the IV, replication of the effect) are approximated and are not readily under the control of the researcher. Because the focus of this article is on experimental designs, quasi-experiments are not discussed in detail; instead the reader is referred to Kazdin (2010) and Shadish et al. (2002) .

Ecological and naturalistic single-case designs

For a single-case design to be experimental, there must be active manipulation of the IV, but in some applications, such as those that might be used in social and personality psychology, the researcher might be interested in measuring naturally occurring phenomena and examining their temporal relationships. Thus, the researcher will not use a manipulation. An example of this type of research might be a study about the temporal relationship between alcohol consumption and depressed mood, which can be measured reliably using EMA methods. Psychotherapy process researchers also use this type of design to assess dyadic relationship dynamics between therapists and clients (e.g., Tschacher & Ramseyer, 2009 ).

Research Design Standards

Each of the reviewed standards provides some degree of direction regarding acceptable research designs. The WWC provides the most detailed and specific requirements regarding design characteristics. Those guidelines presented in Tables 3 , ​ ,4, 4 , and ​ and5 5 are consistent with the methodological rigor necessary to meet the WWC distinction “meets standards.” The WWC also provides less-stringent standards for a “meets standards with reservations” distinction. When minimum criteria in the design, measurement, or analysis sections of a study are not met, it is rated “does not meet standards” ( Kratochwill et al., 2010 ). Many SCEDs are acceptable within the standards of DIV12, DIV16, NRP, and in the Tate et al. SCED scale. DIV12 specifies that replication occurs across a minimum of three successive cases, which differs from the WWC specifications, which allow for three replications within a single-subject design but does not necessarily need to be across multiple subjects. DIV16 does not require, but seems to prefer, a multiple baseline design with a between-subject replication. Tate et al. state that the “design allows for the examination of cause and effect relationships to demonstrate efficacy” (p. 400, 2008). Determining whether or not a design meets this requirement is left up to the evaluator, who might then refer to one of the other standards or another source for direction.

Research Design Standards and Guidelines

Measurement and Assessment Standards and Guidelines

Analysis Standards and Guidelines

The Stone and Shiffman (2002) standards for EMA are concerned almost entirely with the reporting of measurement characteristics and less so with research design. One way in which these standards differ from those of other sources is in the active manipulation of the IV. Many research questions in EMA, daily diary, and time-series designs are concerned with naturally occurring phenomena, and a researcher manipulation would run counter to this aim. The EMA standards become important when selecting an appropriate measurement strategy within the SCED. In EMA applications, as is also true in some other time-series and daily diary designs, researcher manipulation occurs as a function of the sampling interval in which DVs of interest are measured according to fixed time schedules (e.g., reporting occurs at the end of each day), random time schedules (e.g., the data collection device prompts the participant to respond at random intervals throughout the day), or on an event-based schedule (e.g., reporting occurs after a specified event takes place).

Measurement

The basic measurement requirement of the SCED is a repeated assessment of the DV across each phase of the design in order to draw valid inferences regarding the effect of the IV on the DV. In other applications, such as those used by personality and social psychology researchers to study various human phenomena ( Bolger et al., 2003 ; Reis & Gable, 2000 ), sampling strategies vary widely depending on the topic area under investigation. Regardless of the research area, SCEDs are most typically concerned with within-person change and processes and involve a time-based strategy, most commonly to assess global daily averages or peak daily levels of the DV. Many sampling strategies, such as time-series, in which reporting occurs at uniform intervals or on event-based, fixed, or variable schedules, are also appropriate measurement methods and are common in psychological research (see Bolger et al., 2003 ).

Repeated-measurement methods permit the natural, even spontaneous, reporting of information ( Reis, 1994 ), which reduces the biases of retrospection by minimizing the amount of time elapsed between an experience and the account of this experience ( Bolger et al., 2003 ). Shiffman et al. (2008) aptly noted that the majority of research in the field of psychology relies heavily on retrospective assessment measures, even though retrospective reports have been found to be susceptible to state-congruent recall (e.g., Bower, 1981 ) and a tendency to report peak levels of the experience instead of giving credence to temporal fluctuations ( Redelmeier & Kahneman, 1996 ; Stone, Broderick, Kaell, Deles-Paul, & Porter, 2000 ). Furthermore, Shiffman et al. (1997) demonstrated that subjective aggregate accounts were a poor fit to daily reported experiences, which can be attributed to reductions in measurement error resulting in increased validity and reliability of the daily reports.

The necessity of measuring at least one DV repeatedly means that the selected assessment method, instrument, and/or construct must be sensitive to change over time and be capable of reliably and validly capturing change. Horner et al. (2005) discusses the important features of outcome measures selected for use in these types of designs. Kazdin (2010) suggests that measures be dimensional, which can more readily detect effects than categorical and binary measures. Although using an established measure or scale, such as the Outcome Questionnaire System ( M. J. Lambert, Hansen, & Harmon, 2010 ), provides empirically validated items for assessing various outcomes, most measure validation studies conducted on this type of instrument involve between-subject designs, which is no guarantee that these measures are reliable and valid for assessing within-person variability. Borsboom, Mellenbergh, and van Heerden (2003) suggest that researchers adapting validated measures should consider whether the items they propose using have a factor structure within subjects similar to that obtained between subjects. This is one of the reasons that SCEDs often use observational assessments from multiple sources and report the interrater reliability of the measure. Self-report measures are acceptable practice in some circles, but generally additional assessment methods or informants are necessary to uphold the highest methodological standards. The results of this review indicate that the majority of studies include observational measurement (76.0%). Within those studies, nearly all (97.1%) reported interrater reliability procedures and results. The results within each design were similar, with the exception of time-series designs, which used observer ratings in only half of the reviewed studies.

Time-series

Time-series designs are defined by repeated measurement of variables of interest over a period of time ( Box & Jenkins, 1970 ). Time-series measurement most often occurs in uniform intervals; however, this is no longer a constraint of time-series designs (see Harvey, 2001 ). Although uniform interval reporting is not necessary in SCED research, repeated measures often occur at uniform intervals, such as once each day or each week, which constitutes a time-series design. The time-series design has been used in various basic science applications ( Scollon, Kim-Pietro, & Diener, 2003 ) across nearly all subspecialties in psychology (e.g., Bolger et al., 2003 ; Piasecki et al., 2007 ; for a review, see Reis & Gable, 2000 ; Soliday et al., 2002 ). The basic time-series formula for a two-phase (AB) data stream is presented in Equation 1 . In this formula α represents the step function of the data stream; S represents the change between the first and second phases, which is also the intercept in a two-phase data stream and a step function being 0 at times i = 1, 2, 3…n1 and 1 at times i = n1+1, n1+2, n1+3…n; n 1 is the number of observations in the baseline phase; n is the total number of data points in the data stream; i represents time; and ε i = ρε i −1 + e i , which indicates the relationship between the autoregressive function (ρ) and the distribution of the data in the stream.

Time-series formulas become increasingly complex when seasonality and autoregressive processes are modeled in the analytic procedures, but these are rarely of concern for short time-series data streams in SCEDs. For a detailed description of other time-series design and analysis issues, see Borckardt et al. (2008) , Box and Jenkins (1970) , Crosbie (1993) , R. R. Jones et al. (1977) , and Velicer and Fava (2003) .

Time-series and other repeated-measures methodologies also enable examination of temporal effects. Borckardt et al. (2008) and others have noted that time-series designs have the potential to reveal how change occurs, not simply if it occurs. This distinction is what most interested Skinner (1938) , but it often falls below the purview of today’s researchers in favor of group designs, which Skinner felt obscured the process of change. In intervention and psychopathology research, time-series designs can assess mediators of change ( Doss & Atkins, 2006 ), treatment processes ( Stout, 2007 ; Tschacher & Ramseyer, 2009 ), and the relationship between psychological symptoms (e.g., Alloy, Just, & Panzarella, 1997 ; Hanson & Chen, 2010 ; Oslin, Cary, Slaymaker, Colleran, & Blow, 2009 ), and might be capable of revealing mechanisms of change ( Kazdin, 2007 , 2009 , 2010 ). Between- and within-subject SCED designs with repeated measurements enable researchers to examine similarities and differences in the course of change, both during and as a result of manipulating an IV. Temporal effects have been largely overlooked in many areas of psychological science ( Bolger et al., 2003 ): Examining temporal relationships is sorely needed to further our understanding of the etiology and amplification of numerous psychological phenomena.

Time-series studies were very infrequently found in this literature search (2%). Time-series studies traditionally occur in subfields of psychology in which single-case research is not often used (e.g., personality, physiological/biological). Recent advances in methods for collecting and analyzing time-series data (e.g., Borckardt et al., 2008 ) could expand the use of time-series methodology in the SCED community. One problem with drawing firm conclusions from this particular review finding is a semantic factor: Time-series is a specific term reserved for measurement occurring at a uniform interval. However, SCED research appears to not yet have adopted this language when referring to data collected in this fashion. When time-series data analytic methods are not used, the matter of measurement interval is of less importance and might not need to be specified or described as a time-series. An interesting extension of this work would be to examine SCED research that used time-series measurement strategies but did not label it as such. This is important because then it could be determined how many SCEDs could be analyzed with time-series statistical methods.

Daily diary and ecological momentary assessment methods

EMA and daily diary approaches represent methodological procedures for collecting repeated measurements in time-series and non-time-series experiments, which are also known as experience sampling. Presenting an in-depth discussion of the nuances of these sampling techniques is well beyond the scope of this paper. The reader is referred to the following review articles: daily diary ( Bolger et al., 2003 ; Reis & Gable, 2000 ; Thiele, Laireiter, & Baumann, 2002 ), and EMA ( Shiffman et al., 2008 ). Experience sampling in psychology has burgeoned in the past two decades as technological advances have permitted more precise and immediate reporting by participants (e.g., Internet-based, two-way pagers, cellular telephones, handheld computers) than do paper and pencil methods (for reviews see Barrett & Barrett, 2001 ; Shiffman & Stone, 1998 ). Both methods have practical limitations and advantages. For example, electronic methods are more costly and may exclude certain subjects from participating in the study, either because they do not have access to the necessary technology or they do not have the familiarity or savvy to successfully complete reporting. Electronic data collection methods enable the researcher to prompt responses at random or predetermined intervals and also accurately assess compliance. Paper and pencil methods have been criticized for their inability to reliably track respondents’ compliance: Palermo, Valenzuela, and Stork (2004) found better compliance with electronic diaries than with paper and pencil. On the other hand, Green, Rafaeli, Bolger, Shrout, & Reis (2006) demonstrated the psychometric data structure equivalence between these two methods, suggesting that the data collected in either method will yield similar statistical results given comparable compliance rates.

Daily diary/daily self-report and EMA measurement were somewhat rarely represented in this review, occurring in only 6.1% of the total studies. EMA methods had been used in only one of the reviewed studies. The recent proliferation of EMA and daily diary studies in psychology reported by others ( Bolger et al., 2003 ; Piasecki et al., 2007 ; Shiffman et al., 2008 ) suggests that these methods have not yet reached SCED researchers, which could in part have resulted from the long-held supremacy of observational measurement in fields that commonly practice single-case research.

Measurement Standards

As was previously mentioned, measurement in SCEDs requires the reliable assessment of change over time. As illustrated in Table 4 , DIV16 and the NRP explicitly require that reliability of all measures be reported. DIV12 provides little direction in the selection of the measurement instrument, except to require that three or more clinically important behaviors with relative independence be assessed. Similarly, the only item concerned with measurement on the Tate et al. scale specifies assessing behaviors consistent with the target of the intervention. The WWC and the Tate et al. scale require at least two independent assessors of the DV and that interrater reliability meeting minimum established thresholds be reported. Furthermore, WWC requires that interrater reliability be assessed on at least 20% of the data in each phase and in each condition. DIV16 expects that assessment of the outcome measures will be multisource and multimethod, when applicable. The interval of measurement is not specified by any of the reviewed sources. The WWC and the Tate et al. scale require that DVs be measured repeatedly across phases (e.g., baseline and treatment), which is a typical requirement of a SCED. The NRP asks that the time points at which DV measurement occurred be reported.

The baseline measurement represents one of the most crucial design elements of the SCED. Because subjects provide their own data for comparison, gathering a representative, stable sampling of behavior before manipulating the IV is essential to accurately inferring an effect. Some researchers have reported the typical length of the baseline period to range from 3 to 12 observations in intervention research applications (e.g., Center et al., 1986 ; Huitema, 1985 ; R. R. Jones et al., 1977 ; Sharpley, 1987 ); Huitema’s (1985) review of 881 experiments published in the Journal of Applied Behavior Analysis resulted in a modal number of three to four baseline points. Center et al. (1986) suggested five as the minimum number of baseline measurements needed to accurately estimate autocorrelation. Longer baseline periods suggest a greater likelihood of a representative measurement of the DVs, which has been found to increase the validity of the effects and reduce bias resulting from autocorrelation ( Huitema & McKean, 1994 ). The results of this review are largely consistent with those of previous researchers: The mean number of baseline observations was found to be 10.22 ( SD = 9.59), and 6 was the modal number of observations. Baseline data were available in 77.8% of the reviewed studies. Although the baseline assessment has tremendous bearing on the results of a SCED study, it was often difficult to locate the exact number of data points. Similarly, the number of data points assessed across all phases of the study were not easily identified.

The WWC, DIV12, and DIV16 agree that a minimum of three data points during the baseline is necessary. However, to receive the highest rating by the WWC, five data points are necessary in each phase, including the baseline and any subsequent withdrawal baselines as would occur in a reversal design. DIV16 explicitly states that more than three points are preferred and further stipulates that the baseline must demonstrate stability (i.e., limited variability), absence of overlap between the baseline and other phases, absence of a trend, and that the level of the baseline measurement is severe enough to warrant intervention; each of these aspects of the data is important in inferential accuracy. Detrending techniques can be used to address baseline data trend. The integration option in ARIMA-based modeling and the empirical mode decomposition method ( Wu, Huang, Long, & Peng, 2007 ) are two sophisticated detrending techniques. In regression-based analytic methods, detrending can be accomplished by simply regressing each variable in the model on time (i.e., the residuals become the detrended series), which is analogous to adding a linear, exponential, or quadratic term to the regression equation.

NRP does not provide a minimum for data points, nor does the Tate et al. scale, which requires only a sufficient sampling of baseline behavior. Although the mean and modal number of baseline observations is well within these parameters, seven (1.7%) studies reported mean baselines of less than three data points.

Establishing a uniform minimum number of required baseline observations would provide researchers and reviewers with only a starting guide. The baseline phase is important in SCED research because it establishes a trend that can then be compared with that of subsequent phases. Although a minimum number of observations might be required to meet standards, many more might be necessary to establish a trend when there is variability and trends in the direction of the expected effect. The selected data analytic approach also has some bearing on the number of necessary baseline observations. This is discussed further in the Analysis section.

Reporting of repeated measurements

Stone and Shiffman (2002) provide a comprehensive set of guidelines for the reporting of EMA data, which can also be applied to other repeated-measurement strategies. Because the application of EMA is widespread and not confined to specific research designs, Stone and Shiffman intentionally place few restraints on researchers regarding selection of the DV and the reporter, which is determined by the research question under investigation. The methods of measurement, however, are specified in detail: Descriptions of prompting, recording of responses, participant-initiated entries, and the data acquisition interface (e.g., paper and pencil diary, PDA, cellular telephone) ought to be provided with sufficient detail for replication. Because EMA specifically, and time-series/daily diary methods similarly, are primarily concerned with the interval of assessment, Stone and Shiffman suggest reporting the density and schedule of assessment. The approach is generally determined by the nature of the research question and pragmatic considerations, such as access to electronic data collection devices at certain times of the day and participant burden. Compliance and missing data concerns are present in any longitudinal research design, but they are of particular importance in repeated-measurement applications with frequent measurement. When the research question pertains to temporal effects, compliance becomes paramount, and timely, immediate responding is necessary. For this reason, compliance decisions, rates of missing data, and missing data management techniques must be reported. The effect of missing data in time-series data streams has been the topic of recent research in the social sciences (e.g., Smith, Borckardt, & Nash, in press ; Velicer & Colby, 2005a , 2005b ). The results and implications of these and other missing data studies are discussed in the next section.

Analysis of SCED Data

Visual analysis.

Experts in the field generally agree about the majority of critical single-case experiment design and measurement characteristics. Analysis, on the other hand, is an area of significant disagreement, yet it has also received extensive recent attention and advancement. Debate regarding the appropriateness and accuracy of various methods for analyzing SCED data, the interpretation of single-case effect sizes, and other concerns vital to the validity of SCED results has been ongoing for decades, and no clear consensus has been reached. Visual analysis, following systematic procedures such as those provided by Franklin, Gorman, Beasley, and Allison (1997) and Parsonson and Baer (1978) , remains the standard by which SCED data are most commonly analyzed ( Parker, Cryer, & Byrns, 2006 ). Visual analysis can arguably be applied to all SCEDs. However, a number of baseline data characteristics must be met for effects obtained through visual analysis to be valid and reliable. The baseline phase must be relatively stable; free of significant trend, particularly in the hypothesized direction of the effect; have minimal overlap of data with subsequent phases; and have a sufficient sampling of behavior to be considered representative ( Franklin, Gorman, et al., 1997 ; Parsonson & Baer, 1978 ). The effect of baseline trend on visual analysis, and a technique to control baseline trend, are offered by Parker et al. (2006) . Kazdin (2010) suggests using statistical analysis when a trend or significant variability appears in the baseline phase, two conditions that ought to preclude the use of visual analysis techniques. Visual analysis methods are especially adept at determining intervention effects and can be of particular relevance in real-world applications (e.g., Borckardt et al., 2008 ; Kratochwill, Levin, Horner, & Swoboda, 2011 ).

However, visual analysis has its detractors. It has been shown to be inconsistent, can be affected by autocorrelation, and results in overestimation of effect (e.g., Matyas & Greenwood, 1990 ). Visual analysis as a means of estimating an effect precludes the results of SCED research from being included in meta-analysis, and also makes it very difficult to compare results to the effect sizes generated by other statistical methods. Yet, visual analysis proliferates in large part because SCED researchers are familiar with these methods and are not only generally unfamiliar with statistical approaches, but lack agreement about their appropriateness. Still, top experts in single-case analysis champion the use of statistical methods alongside visual analysis whenever it is appropriate to do so ( Kratochwill et al., 2011 ).

Statistical analysis

Statistical analysis of SCED data consists generally of an attempt to address one or more of three broad research questions: (1) Does introduction/manipulation of the IV result in statistically significant change in the level of the DV (level-change or phase-effect analysis)? (2) Does introduction/manipulation of the IV result in statistically significant change in the slope of the DV over time (slope-change analysis)? and (3) Do meaningful relationships exist between the trajectory of the DV and other potential covariates? Level- and slope-change analyses are relevant to intervention effectiveness studies and other research questions in which the IV is expected to result in changes in the DV in a particular direction. Visual analysis methods are most adept at addressing research questions pertaining to changes in level and slope (Questions 1 and 2), most often using some form of graphical representation and standardized computation of a mean level or trend line within and between each phase of interest (e.g., Horner & Spaulding, 2010 ; Kratochwill et al., 2011 ; Matyas & Greenwood, 1990 ). Research questions in other areas of psychological science might address the relationship between DVs or the slopes of DVs (Question 3). A number of sophisticated modeling approaches (e.g., cross-lag, multilevel, panel, growth mixture, latent class analysis) may be used for this type of question, and some are discussed in greater detail later in this section. However, a discussion about the nuances of this type of analysis and all their possible methods is well beyond the scope of this article.

The statistical analysis of SCEDs is a contentious issue in the field. Not only is there no agreed-upon statistical method, but the practice of statistical analysis in the context of the SCED is viewed by some as unnecessary (see Shadish, Rindskopf, & Hedges, 2008 ). Traditional trends in the prevalence of statistical analysis usage by SCED researchers are revealing: Busk & Marascuilo (1992) found that only 10% of the published single-case studies they reviewed used statistical analysis; Brossart, Parker, Olson, & Mahadevan (2006) estimated that this figure had roughly doubled by 2006. A range of concerns regarding single-case effect size calculation and interpretation is discussed in significant detail elsewhere (e.g., Campbell, 2004 ; Cohen, 1994 ; Ferron & Sentovich, 2002 ; Ferron & Ware, 1995 ; Kirk, 1996 ; Manolov & Solanas, 2008 ; Olive & Smith, 2005 ; Parker & Brossart, 2003 ; Robey et al., 1999 ; Smith et al., in press ; Velicer & Fava, 2003 ). One concern is the lack of a clearly superior method across datasets. Although statistical methods for analyzing SCEDs abound, few studies have examined their comparative performance with the same dataset. The most recent studies of this kind, performed by Brossart et al. (2006) , Campbell (2004) , Parker and Brossart (2003) , and Parker and Vannest (2009) , found that the more promising available statistical analysis methods yielded moderately different results on the same data series, which led them to conclude that each available method is equipped to adequately address only a relatively narrow spectrum of data. Given these findings, analysts need to select an appropriate model for the research questions and data structure, being mindful of how modeling results can be influenced by extraneous factors.

The current standards unfortunately provide little guidance in the way of statistical analysis options. This article presents an admittedly cursory introduction to available statistical methods; many others are not covered in this review. The following articles provide more in-depth discussion and description of other methods: Barlow et al. (2008) ; Franklin et al., (1997) ; Kazdin (2010) ; and Kratochwill and Levin (1992 , 2010 ). Shadish et al. (2008) summarize more recently developed methods. Similarly, a Special Issue of Evidence-Based Communication Assessment and Intervention (2008, Volume 2) provides articles and discussion of the more promising statistical methods for SCED analysis. An introduction to autocorrelation and its implications for statistical analysis is necessary before specific analytic methods can be discussed. It is also pertinent at this time to discuss the implications of missing data.

Autocorrelation

Many repeated measurements within a single subject or unit create a situation that most psychological researchers are unaccustomed to dealing with: autocorrelated data, which is the nonindependence of sequential observations, also known as serial dependence. Basic and advanced discussions of autocorrelation in single-subject data can be found in Borckardt et al. (2008) , Huitema (1985) , and Marshall (1980) , and discussions of autocorrelation in multilevel models can be found in Snijders and Bosker (1999) and Diggle and Liang (2001) . Along with trend and seasonal variation, autocorrelation is one example of the internal structure of repeated measurements. In the social sciences, autocorrelated data occur most naturally in the fields of physiological psychology, econometrics, and finance, where each phase of interest has potentially hundreds or even thousands of observations that are tightly packed across time (e.g., electroencephalography actuarial data, financial market indices). Applied SCED research in most areas of psychology is more likely to have measurement intervals of day, week, or hour.

Autocorrelation is a direct result of the repeated-measurement requirements of the SCED, but its effect is most noticeable and problematic when one is attempting to analyze these data. Many commonly used data analytic approaches, such as analysis of variance, assume independence of observations and can produce spurious results when the data are nonindependent. Even statistically insignificant autocorrelation estimates are generally viewed as sufficient to cause inferential bias when conventional statistics are used (e.g., Busk & Marascuilo, 1988 ; R. R. Jones et al., 1977 ; Matyas & Greenwood, 1990 ). The effect of autocorrelation on statistical inference in single-case applications has also been known for quite some time (e.g., R. R. Jones et al., 1977 ; Kanfer, 1970 ; Kazdin, 1981 ; Marshall, 1980 ). The findings of recent simulation studies of single-subject data streams indicate that autocorrelation is a nontrivial matter. For example, Manolov and Solanas (2008) determined that calculated effect sizes were linearly related to the autocorrelation of the data stream, and Smith et al. (in press) demonstrated that autocorrelation estimates in the vicinity of 0.80 negatively affect the ability to correctly infer a significant level-change effect using a standardized mean differences method. Huitema and colleagues (e.g., Huitema, 1985 ; Huitema & McKean, 1994 ) argued that autocorrelation is rarely a concern in applied research. Huitema’s methods and conclusions have been questioned and opposing data have been published (e.g., Allison & Gorman, 1993 ; Matyas & Greenwood, 1990 ; Robey et al., 1999 ), resulting in abandonment of the position that autocorrelation can be conscionably ignored without compromising the validity of the statistical procedures. Procedures for removing autocorrelation in the data stream prior to calculating effect sizes are offered as one option: One of the more promising analysis methods, autoregressive integrated moving averages (discussed later in this article), was specifically designed to remove the internal structure of time-series data, such as autocorrelation, trend, and seasonality ( Box & Jenkins, 1970 ; Tiao & Box, 1981 ).

Missing observations

Another concern inherent in repeated-measures designs is missing data. Daily diary and EMA methods are intended to reduce the risk of retrospection error by eliciting accurate, real-time information ( Bolger et al., 2003 ). However, these methods are subject to missing data as a result of honest forgetfulness, not possessing the diary collection tool at the specified time of collection, and intentional or systematic noncompliance. With paper and pencil diaries and some electronic methods, subjects might be able to complete missed entries retrospectively, defeating the temporal benefits of these assessment strategies ( Bolger et al., 2003 ). Methods of managing noncompliance through the study design and measurement methods include training the subject to use the data collection device appropriately, using technology to prompt responding and track the time of response, and providing incentives to participants for timely compliance (for additional discussion of this topic, see Bolger et al., 2003 ; Shiffman & Stone, 1998 ).

Even when efforts are made to maximize compliance during the conduct of the research, the problem of missing data is often unavoidable. Numerous approaches exist for handling missing observations in group multivariate designs (e.g., Horton & Kleinman, 2007 ; Ibrahim, Chen, Lipsitz, & Herring, 2005 ). Ragunathan (2004) and others concluded that full information and raw data maximum likelihood methods are preferable. Velicer and Colby (2005a , 2005b ) established the superiority of maximum likelihood methods over listwise deletion, mean of adjacent observations, and series mean substitution in the estimation of various critical time-series data parameters. Smith et al. (in press) extended these findings regarding the effect of missing data on inferential precision. They found that managing missing data with the EM procedure ( Dempster, Laird, & Rubin, 1977 ), a maximum likelihood algorithm, did not affect one’s ability to correctly infer a significant effect. However, lag-1 autocorrelation estimates in the vicinity of 0.80 resulted in insufficient power sensitivity (< 0.80), regardless of the proportion of missing data (10%, 20%, 30%, or 40%). 1 Although maximum likelihood methods have garnered some empirical support, methodological strategies that minimize missing data, particularly systematically missing data, are paramount to post-hoc statistical remedies.

Nonnormal distribution of data

In addition to the autocorrelated nature of SCED data, typical measurement methods also present analytic challenges. Many statistical methods, particularly those involving model finding, assume that the data are normally distributed. This is often not satisfied in SCED research when measurements involve count data, observer-rated behaviors, and other, similar metrics that result in skewed distributions. Techniques are available to manage nonnormal distributions in regression-based analysis, such as zero-inflated Poisson regression ( D. Lambert, 1992 ) and negative binomial regression ( Gardner, Mulvey, & Shaw, 1995 ), but many other statistical analysis methods do not include these sophisticated techniques. A skewed data distribution is perhaps one of the reasons Kazdin (2010) suggests not using count, categorical, or ordinal measurement methods.

Available statistical analysis methods

Following is a basic introduction to the more promising and prevalent analytic methods for SCED research. Because there is little consensus regarding the superiority of any single method, the burden unfortunately falls on the researcher to select a method capable of addressing the research question and handling the data involved in the study. Some indications and contraindications are provided for each method presented here.

Multilevel and structural equation modeling

Multilevel modeling (MLM; e.g., Schmidt, Perels, & Schmitz, 2010 ) techniques represent the state of the art among parametric approaches to SCED analysis, particularly when synthesizing SCED results ( Shadish et al., 2008 ). MLM and related latent growth curve and factor mixture methods in structural equation modeling (SEM; e.g., Lubke & Muthén, 2005 ; B. O. Muthén & Curran, 1997 ) are particularly effective for evaluating trajectories and slopes in longitudinal data and relating changes to potential covariates. MLM and related hierarchical linear models (HLM) can also illuminate the relationship between the trajectories of different variables under investigation and clarify whether or not these relationships differ amongst the subjects in the study. Time-series and cross-lag analyses can also be used in MLM and SEM ( Chow, Ho, Hamaker, & Dolan, 2010 ; du Toit & Browne, 2007 ). However, they generally require sophisticated model-fitting techniques, making them difficult for many social scientists to implement. The structure (autocorrelation) and trend of the data can also complicate many MLM methods. The common, short data streams in SCED research and the small number of subjects also present problems to MLM and SEM approaches, which were developed for data with significantly greater numbers of observations when the number of subjects is fewer, and for a greater number of participants for model-fitting purposes, particularly when there are fewer data points. Still, MLM and related techniques arguably represent the most promising analytic methods.

A number of software options 2 exist for SEM. Popular statistical packages in the social sciences provide SEM options, such as PROC CALIS in SAS ( SAS Institute Inc., 2008 ), the AMOS module ( Arbuckle, 2006 ) of SPSS ( SPSS Statistics, 2011 ), and the sempackage for R ( R Development Core Team, 2005 ), the use of which is described by Fox ( Fox, 2006 ). A number of stand-alone software options are also available for SEM applications, including Mplus ( L. K. Muthén & Muthén, 2010 ) and Stata ( StataCorp., 2011 ). Each of these programs also provides options for estimating multilevel/hierarchical models (for a review of using these programs for MLM analysis see Albright & Marinova, 2010 ). Hierarchical linear and nonlinear modeling can also be accomplished using the HLM 7 program ( Raudenbush, Bryk, & Congdon, 2011 ).

Autoregressive moving averages (ARMA; e.g., Browne & Nesselroade, 2005 ; Liu & Hudack, 1995 ; Tiao & Box, 1981 )

Two primary points have been raised regarding ARMA modeling: length of the data stream and feasibility of the modeling technique. ARMA models generally require 30–50 observations in each phase when analyzing a single-subject experiment (e.g., Borckardt et al., 2008 ; Box & Jenkins, 1970 ), which is often difficult to satisfy in applied psychological research applications. However, ARMA models in an SEM framework, such as those described by du Toit & Browne (2001) , are well suited for longitudinal panel data with few observations and many subjects. Autoregressive SEM models are also applicable under similar conditions. Model-fitting options are available in SPSS, R, and SAS via PROC ARMA.

ARMA modeling also requires considerable training in the method and rather advanced knowledge about statistical methods (e.g., Kratochwill & Levin, 1992 ). However, Brossart et al. (2006) point out that ARMA-based approaches can produce excellent results when there is no “model finding” and a simple lag-1 model, with no differencing and no moving average, is used. This approach can be taken for many SCED applications when phase- or slope-change analyses are of interest with a single, or very few, subjects. As already mentioned, this method is particularly useful when one is seeking to account for autocorrelation or other over-time variations that are not directly related to the experimental or intervention effect of interest (i.e., detrending). ARMA and other time-series analysis methods require missing data to be managed prior to analysis by means of options such as full information maximum likelihood estimation, multiple imputation, or the Kalman filter (see Box & Jenkins, 1970 ; Hamilton, 1994 ; Shumway & Stoffer, 1982 ) because listwise deletion has been shown to result in inaccurate time-series parameter estimates ( Velicer & Colby, 2005a ).

Standardized mean differences

Standardized mean differences approaches include the common Cohen’s d , Glass’s Delta, and Hedge’s g that are used in the analysis of group designs. The computational properties of mean differences approaches to SCEDs are identical to those used for group comparisons, except that the results represent within-case variation instead of the variation between groups, which suggests that the obtained effect sizes are not interpretively equivalent. The advantage of the mean differences approach is its simplicity of calculation and also its familiarity to social scientists. The primary drawback of these approaches is that they were not developed to contend with autocorrelated data. However, Manolov and Solanas (2008) reported that autocorrelation least affected effect sizes calculated using standardized mean differences approaches. To the applied-research scientist this likely represents the most accessible analytic approach, because statistical software is not required to calculate these effect sizes. The resultant effect sizes of single subject standardized mean differences analysis must be interpreted cautiously because their relation to standard effect size benchmarks, such as those provided by Cohen (1988) , is unknown. Standardized mean differences approaches are appropriate only when examining significant differences between phases of the study and cannot illuminate trajectories or relationships between variables.

Other analytic approaches

Researchers have offered other analytic methods to deal with the characteristics of SCED data. A number of methods for analyzing N -of-1 experiments have been developed. Borckardt’s Simulation Modeling Analysis (2006) program provides a method for analyzing level- and slope-change in short (<30 observations per phase; see Borckardt et al., 2008 ), autocorrelated data streams that is statistically sophisticated, yet accessible and freely available to typical psychological scientists and clinicians. A replicated single-case time-series design conducted by Smith, Handler, & Nash (2010) provides an example of SMA application. The Singwin Package, described in Bloom et al., (2003) , is a another easy-to-use parametric approach for analyzing single-case experiments. A number of nonparametric approaches have also been developed that emerged from the visual analysis tradition: Some examples include percent nonoverlapping data ( Scruggs, Mastropieri, & Casto, 1987 ) and nonoverlap of all pairs ( Parker & Vannest, 2009 ); however, these methods have come under scrutiny, and Wolery, Busick, Reichow, and Barton (2010) have suggested abandoning them altogether. Each of these methods appears to be well suited for managing specific data characteristics, but they should not be used to analyze data streams beyond their intended purpose until additional empirical research is conducted.

Combining SCED Results

Beyond the issue of single-case analysis is the matter of integrating and meta-analyzing the results of single-case experiments. SCEDs have been given short shrift in the majority of meta-analytic literature ( Littell, Corcoran, & Pillai, 2008 ; Shadish et al., 2008 ), with only a few exceptions ( Carr et al., 1999 ; Horner & Spaulding, 2010 ). Currently, few proven methods exist for integrating the results of multiple single-case experiments. Allison and Gorman (1993) and Shadish et al. (2008) present the problems associated with meta-analyzing single-case effect sizes, and W. P. Jones (2003) , Manolov and Solanas (2008) , Scruggs and Mastropieri (1998) , and Shadish et al. (2008) offer four different potential statistical solutions for this problem, none of which appear to have received consensus amongst researchers. The ability to synthesize and compare single-case effect sizes, particularly effect sizes garnered through group design research, is undoubtedly necessary to increase SCED proliferation.

Discussion of Review Results and Coding of Analytic Methods

The coding criteria for this review were quite stringent in terms of what was considered to be either visual or statistical analysis. For visual analysis to be coded as present, it was necessary for the authors to self-identify as having used a visual analysis method. In many cases, it could likely be inferred that visual analysis had been used, but it was often not specified. Similarly, statistical analysis was reserved for analytic methods that produced an effect. 3 Analyses that involved comparing magnitude of change using raw count data or percentages were not considered rigorous enough. These two narrow definitions of visual and statistical analysis contributed to the high rate of unreported analytic method, shown in Table 1 (52.3%). A better representation of the use of visual and statistical analysis would likely be the percentage of studies within those that reported a method of analysis. Under these parameters, 41.5% used visual analysis and 31.3% used statistical analysis. Included in these figures are studies that included both visual and statistical methods (11%). These findings are slightly higher than those estimated by Brossart et al. (2006) , who estimated statistical analysis is used in about 20% of SCED studies. Visual analysis continues to undoubtedly be the most prevalent method, but there appears to be a trend for increased use of statistical approaches, which is likely to only gain momentum as innovations continue.

Analysis Standards

The standards selected for inclusion in this review offer minimal direction in the way of analyzing the results of SCED research. Table 5 summarizes analysis-related information provided by the six reviewed sources for SCED standards. Visual analysis is acceptable to DV12 and DIV16, along with unspecified statistical approaches. In the WWC standards, visual analysis is the acceptable method of determining an intervention effect, with statistical analyses and randomization tests permissible as a complementary or supporting method to the results of visual analysis methods. However, the authors of the WWC standards state, “As the field reaches greater consensus about appropriate statistical analyses and quantitative effect-size measures, new standards for effect demonstration will need to be developed” ( Kratochwill et al., 2010 , p.16). The NRP and DIV12 seem to prefer statistical methods when they are warranted. The Tate at al. scale accepts only statistical analysis with the reporting of an effect size. Only the WWC and DIV16 provide guidance in the use of statistical analysis procedures: The WWC “recommends” nonparametric and parametric approaches, multilevel modeling, and regression when statistical analysis is used. DIV16 refers the reader to Wilkinson and the Task Force on Statistical Inference of the APA Board of Scientific Affairs (1999) for direction in this matter. Statistical analysis of daily diary and EMA methods is similarly unsettled. Stone and Shiffman (2002) ask for a detailed description of the statistical procedures used, in order for the approach to be replicated and evaluated. They provide direction for analyzing aggregated and disaggregated data. They also aptly note that because many different modes of analysis exist, researchers must carefully match the analytic approach to the hypotheses being pursued.

Limitations and Future Directions

This review has a number of limitations that leave the door open for future study of SCED methodology. Publication bias is a concern in any systematic review. This is particularly true for this review because the search was limited to articles published in peer-reviewed journals. This strategy was chosen in order to inform changes in the practice of reporting and of reviewing, but it also is likely to have inflated the findings regarding the methodological rigor of the reviewed works. Inclusion of book chapters, unpublished studies, and dissertations would likely have yielded somewhat different results.

A second concern is the stringent coding criteria in regard to the analytic methods and the broad categorization into visual and statistical analytic approaches. The selection of an appropriate method for analyzing SCED data is perhaps the murkiest area of this type of research. Future reviews that evaluate the appropriateness of selected analytic strategies and provide specific decision-making guidelines for researchers would be a very useful contribution to the literature. Although six sources of standards apply to SCED research reviewed in this article, five of them were developed almost exclusively to inform psychological and behavioral intervention research. The principles of SCED research remain the same in different contexts, but there is a need for non–intervention scientists to weigh in on these standards.

Finally, this article provides a first step in the synthesis of the available SCED reporting guidelines. However, it does not resolve disagreements, nor does it purport to be a definitive source. In the future, an entity with the authority to construct such a document ought to convene and establish a foundational, adaptable, and agreed-upon set of guidelines that cuts across subspecialties but is applicable to many, if not all, areas of psychological research, which is perhaps an idealistic goal. Certain preferences will undoubtedly continue to dictate what constitutes acceptable practice in each subspecialty of psychology, but uniformity along critical dimensions will help advance SCED research.

Conclusions

The first decade of the twenty-first century has seen an upwelling of SCED research across nearly all areas of psychology. This article contributes updated benchmarks in terms of the frequency with which SCED design and methodology characteristics are used, including the number of baseline observations, assessment and measurement practices, and data analytic approaches, most of which are largely consistent with previously reported benchmarks. However, this review is much broader than those of previous research teams and also breaks down the characteristics of single-case research by the predominant design. With the recent SCED proliferation came a number of standards for the conduct and reporting of such research. This article also provides a much-needed synthesis of recent SCED standards that can inform the work of researchers, reviewers, and funding agencies conducting and evaluating single-case research, which reveals many areas of consensus as well as areas of significant disagreement. It appears that the question of where to go next is very relevant at this point in time. The majority of the research design and measurement characteristics of the SCED are reasonably well established, and the results of this review suggest general practice that is in accord with existing standards and guidelines, at least in regard to published peer-reviewed works. In general, the published literature appears to be meeting the basic design and measurement requirement to ensure adequate internal validity of SCED studies.

Consensus regarding the superiority of any one analytic method stands out as an area of divergence. Judging by the current literature and lack of consensus, researchers will need to carefully select a method that matches the research design, hypotheses, and intended conclusions of the study, while also considering the most up-to-date empirical support for the chosen analytic method, whether it be visual or statistical. In some cases the number of observations and subjects in the study will dictate which analytic methods can and cannot be used. In the case of the true N -of-1 experiment, there are relatively few sound analytic methods, and even fewer that are robust with shorter data streams (see Borckardt et al., 2008 ). As the number of observations and subjects increases, sophisticated modeling techniques, such as MLM, SEM, and ARMA, become applicable. Trends in the data and autocorrelation further obfuscate the development of a clear statistical analysis selection algorithm, which currently does not exist. Autocorrelation was rarely addressed or discussed in the articles reviewed, except when the selected statistical analysis dictated consideration. Given the empirical evidence regarding the effect of autocorrelation on visual and statistical analysis, researchers need to address this more explicitly. Missing-data considerations are similarly left out when they are unnecessary for analytic purposes. As newly devised statistical analysis approaches mature and are compared with one another for appropriateness in specific SCED applications, guidelines for statistical analysis will necessarily be revised. Similarly, empirically derived guidance, in the form of a decision tree, must be developed to ensure application of appropriate methods based on characteristics of the data and the research questions being addressed. Researchers could also benefit from tutorials and comparative reviews of different software packages: This is a needed area of future research. Powerful and reliable statistical analyses help move the SCED up the ladder of experimental designs and attenuate the view that the method applies primarily to pilot studies and idiosyncratic research questions and situations.

Another potential future advancement of SCED research comes in the area of measurement. Currently, SCED research gives significant weight to observer ratings and seems to discourage other forms of data collection methods. This is likely due to the origins of the SCED in behavioral assessment and applied behavior analysis, which remains a present-day stronghold. The dearth of EMA and diary-like sampling procedures within the SCED research reviewed, yet their ever-growing prevalence in the larger psychological research arena, highlights an area for potential expansion. Observational measurement, although reliable and valid in many contexts, is time and resource intensive and not feasible in all areas in which psychologists conduct research. It seems that numerous untapped research questions are stifled because of this measurement constraint. SCED researchers developing updated standards in the future should include guidelines for the appropriate measurement requirement of non-observer-reported data. For example, the results of this review indicate that reporting of repeated measurements, particularly the high-density type found in diary and EMA sampling strategies, ought to be more clearly spelled out, with specific attention paid to autocorrelation and trend in the data streams. In the event that SCED researchers adopt self-reported assessment strategies as viable alternatives to observation, a set of standards explicitly identifying the necessary psychometric properties of the measures and specific items used would be in order.

Along similar lines, SCED researchers could take a page from other areas of psychology that champion multimethod and multisource evaluation of primary outcomes. In this way, the long-standing tradition of observational assessment and the cutting-edge technological methods of EMA and daily diary could be married with the goal of strengthening conclusions drawn from SCED research and enhancing the validity of self-reported outcome assessment. The results of this review indicate that they rarely intersect today, and I urge SCED researchers to adopt other methods of assessment informed by time-series, daily diary, and EMA methods. The EMA standards could serve as a jumping-off point for refined measurement and assessment reporting standards in the context of multimethod SCED research.

One limitation of the current SCED standards is their relatively limited scope. To clarify, with the exception of the Stone & Shiffman EMA reporting guidelines, the other five sources of standards were developed in the context of designing and evaluating intervention research. Although this is likely to remain its patent emphasis, SCEDs are capable of addressing other pertinent research questions in the psychological sciences, and the current standards truly only roughly approximate salient crosscutting SCED characteristics. I propose developing broad SCED guidelines that address the specific design, measurement, and analysis issues in a manner that allows it to be useful across applications, as opposed to focusing solely on intervention effects. To accomplish this task, methodology experts across subspecialties in psychology would need to convene. Admittedly this is no small task.

Perhaps funding agencies will also recognize the fiscal and practical advantages of SCED research in certain areas of psychology. One example is in the field of intervention effectiveness, efficacy, and implementation research. A few exemplary studies using robust forms of SCED methodology are needed in the literature. Case-based methodologies will never supplant the group design as the gold standard in experimental applications, nor should that be the goal. Instead, SCEDs provide a viable and valid alternative experimental methodology that could stimulate new areas of research and answer questions that group designs cannot. With the astonishing number of studies emerging every year that use single-case designs and explore the methodological aspects of the design, we are poised to witness and be a part of an upsurge in the sophisticated application of the SCED. When federal grant-awarding agencies and journal editors begin to use formal standards while making funding and publication decisions, the field will benefit.

Last, for the practice of SCED research to continue and mature, graduate training programs must provide students with instruction in all areas of the SCED. This is particularly true of statistical analysis techniques that are not often taught in departments of psychology and education, where the vast majority of SCED studies seem to be conducted. It is quite the conundrum that the best available statistical analytic methods are often cited as being inaccessible to social science researchers who conduct this type of research. This need not be the case. To move the field forward, emerging scientists must be able to apply the most state-of-the-art research designs, measurement techniques, and analytic methods.

Acknowledgments

Research support for the author was provided by research training grant MH20012 from the National Institute of Mental Health, awarded to Elizabeth A. Stormshak. The author gratefully acknowledges Robert Horner and Laura Lee McIntyre, University of Oregon; Michael Nash, University of Tennessee; John Ferron, University of South Florida; the Action Editor, Lisa Harlow, and the anonymous reviewers for their thoughtful suggestions and guidance in shaping this article; Cheryl Mikkola for her editorial support; and Victoria Mollison for her assistance in the systematic review process.

Appendix. Results of Systematic Review Search and Studies Included in the Review

Psycinfo search conducted july 2011.

  • Alternating treatment design
  • Changing criterion design
  • Experimental case*
  • Multiple baseline design
  • Replicated single-case design
  • Simultaneous treatment design
  • Time-series design
  • Quantitative study OR treatment outcome/randomized clinical trial
  • NOT field study OR interview OR focus group OR literature review OR systematic review OR mathematical model OR qualitative study
  • Publication range: 2000–2010
  • Published in peer-reviewed journals
  • Available in the English Language

Bibliography

(* indicates inclusion in study: N = 409)

1 Autocorrelation estimates in this range can be caused by trends in the data streams, which creates complications in terms of detecting level-change effects. The Smith et al. (in press) study used a Monte Carlo simulation to control for trends in the data streams, but trends are likely to exist in real-world data with high lag-1 autocorrelation estimates.

2 The author makes no endorsement regarding the superiority of any statistical program or package over another by their mention or exclusion in this article. The author also has no conflicts of interest in this regard.

3 However, it should be noted that it was often very difficult to locate an actual effect size reported in studies that used statistical analysis. Although this issue would likely have added little to this review, it does inhibit the inclusion of the results in meta-analysis.

  • Albright JJ, Marinova DM. Estimating multilevel modelsuUsing SPSS, Stata, and SAS. Indiana University; 2010. Retrieved from http://www.iub.edu/%7Estatmath/stat/all/hlm/hlm.pdf . [ Google Scholar ]
  • Allison DB, Gorman BS. Calculating effect sizes for meta-analysis: The case of the single case. Behavior Research and Therapy. 1993; 31 (6):621–631. doi: 10.1016/0005-7967(93)90115-B. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Alloy LB, Just N, Panzarella C. Attributional style, daily life events, and hopelessness depression: Subtype validation by prospective variability and specificity of symptoms. Cognitive Therapy Research. 1997; 21 :321–344. doi: 10.1023/A:1021878516875. [ CrossRef ] [ Google Scholar ]
  • Arbuckle JL. Amos (Version 7.0) Chicago, IL: SPSS, Inc; 2006. [ Google Scholar ]
  • Barlow DH, Nock MK, Hersen M. Single case research designs: Strategies for studying behavior change. 3. New York, NY: Allyn and Bacon; 2008. [ Google Scholar ]
  • Barrett LF, Barrett DJ. An introduction to computerized experience sampling in psychology. Social Science Computer Review. 2001; 19 (2):175–185. doi: 10.1177/089443930101900204. [ CrossRef ] [ Google Scholar ]
  • Bloom M, Fisher J, Orme JG. Evaluating practice: Guidelines for the accountable professional. 4. Boston, MA: Allyn & Bacon; 2003. [ Google Scholar ]
  • Bolger N, Davis A, Rafaeli E. Diary methods: Capturing life as it is lived. Annual Review of Psychology. 2003; 54 :579–616. doi: 10.1146/annurev.psych.54.101601.145030. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Borckardt JJ. Simulation Modeling Analysis: Time series analysis program for short time series data streams (Version 8.3.3) Charleston, SC: Medical University of South Carolina; 2006. [ Google Scholar ]
  • Borckardt JJ, Nash MR, Murphy MD, Moore M, Shaw D, O’Neil P. Clinical practice as natural laboratory for psychotherapy research. American Psychologist. 2008; 63 :1–19. doi: 10.1037/0003-066X.63.2.77. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Borsboom D, Mellenbergh GJ, van Heerden J. The theoretical status of latent variables. Psychological Review. 2003; 110 (2):203–219. doi: 10.1037/0033-295X.110.2.203. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bower GH. Mood and memory. American Psychologist. 1981; 36 (2):129–148. doi: 10.1037/0003-066x.36.2.129. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Box GEP, Jenkins GM. Time-series analysis: Forecasting and control. San Francisco, CA: Holden-Day; 1970. [ Google Scholar ]
  • Brossart DF, Parker RI, Olson EA, Mahadevan L. The relationship between visual analysis and five statistical analyses in a simple AB single-case research design. Behavior Modification. 2006; 30 (5):531–563. doi: 10.1177/0145445503261167. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Browne MW, Nesselroade JR. Representing psychological processes with dynamic factor models: Some promising uses and extensions of autoregressive moving average time series models. In: Maydeu-Olivares A, McArdle JJ, editors. Contemporary psychometrics: A festschrift for Roderick P McDonald. Mahwah, NJ: Lawrence Erlbaum Associates Publishers; 2005. pp. 415–452. [ Google Scholar ]
  • Busk PL, Marascuilo LA. Statistical analysis in single-case research: Issues, procedures, and recommendations, with applications to multiple behaviors. In: Kratochwill TR, Levin JR, editors. Single-case research design and analysis: New directions for psychology and education. Hillsdale, NJ, England: Lawrence Erlbaum Associates, Inc; 1992. pp. 159–185. [ Google Scholar ]
  • Busk PL, Marascuilo RC. Autocorrelation in single-subject research: A counterargument to the myth of no autocorrelation. Behavioral Assessment. 1988; 10 :229–242. [ Google Scholar ]
  • Campbell JM. Statistical comparison of four effect sizes for single-subject designs. Behavior Modification. 2004; 28 (2):234–246. doi: 10.1177/0145445503259264. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Carr EG, Horner RH, Turnbull AP, Marquis JG, Magito McLaughlin D, McAtee ML, Doolabh A. Positive behavior support for people with developmental disabilities: A research synthesis. Washington, DC: American Association on Mental Retardation; 1999. [ Google Scholar ]
  • Center BA, Skiba RJ, Casey A. A methodology for the quantitative synthesis of intra-subject design research. Journal of Educational Science. 1986; 19 :387–400. doi: 10.1177/002246698501900404. [ CrossRef ] [ Google Scholar ]
  • Chambless DL, Hollon SD. Defining empirically supported therapies. Journal of Consulting and Clinical Psychology. 1998; 66 (1):7–18. doi: 10.1037/0022-006X.66.1.7. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chambless DL, Ollendick TH. Empirically supported psychological interventions: Controversies and evidence. Annual Review of Psychology. 2001; 52 :685–716. doi: 10.1146/annurev.psych.52.1.685. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chow S-M, Ho M-hR, Hamaker EL, Dolan CV. Equivalence and differences between structural equation modeling and state-space modeling techniques. Structural Equation Modeling. 2010; 17 (2):303–332. doi: 10.1080/10705511003661553. [ CrossRef ] [ Google Scholar ]
  • Cohen J. Statistical power analysis for the bahavioral sciences. 2. Hillsdale, NJ: Erlbaum; 1988. [ Google Scholar ]
  • Cohen J. The earth is round (p < .05) American Psychologist. 1994; 49 :997–1003. doi: 10.1037/0003-066X.49.12.997. [ CrossRef ] [ Google Scholar ]
  • Crosbie J. Interrupted time-series analysis with brief single-subject data. Journal of Consulting and Clinical Psychology. 1993; 61 (6):966–974. doi: 10.1037/0022-006X.61.6.966. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dattilio FM, Edwards JA, Fishman DB. Case studies within a mixed methods paradigm: Toward a resolution of the alienation between researcher and practitioner in psychotherapy research. Psychotherapy: Theory, Research, Practice, Training. 2010; 47 (4):427–441. doi: 10.1037/a0021181. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dempster A, Laird N, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B. 1977; 39 (1):1–38. [ Google Scholar ]
  • Des Jarlais DC, Lyles C, Crepaz N. Improving the reporting quality of nonrandomized evaluations of behavioral and public health interventions: the TREND statement. American Journal of Public Health. 2004; 94 (3):361–366. doi: 10.2105/ajph.94.3.361. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Diggle P, Liang KY. Analyses of longitudinal data. New York: Oxford University Press; 2001. [ Google Scholar ]
  • Doss BD, Atkins DC. Investigating treatment mediators when simple random assignment to a control group is not possible. Clinical Psychology: Science and Practice. 2006; 13 (4):321–336. doi: 10.1111/j.1468-2850.2006.00045.x. [ CrossRef ] [ Google Scholar ]
  • du Toit SHC, Browne MW. The covariance structure of a vector ARMA time series. In: Cudeck R, du Toit SHC, Sörbom D, editors. Structural equation modeling: Present and future. Lincolnwood, IL: Scientific Software International; 2001. pp. 279–314. [ Google Scholar ]
  • du Toit SHC, Browne MW. Structural equation modeling of multivariate time series. Multivariate Behavioral Research. 2007; 42 :67–101. doi: 10.1080/00273170701340953. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fechner GT. Elemente der psychophysik [Elements of psychophysics] Leipzig, Germany: Breitkopf & Hartel; 1889. [ Google Scholar ]
  • Ferron J, Sentovich C. Statistical power of randomization tests used with multiple-baseline designs. The Journal of Experimental Education. 2002; 70 :165–178. doi: 10.1080/00220970209599504. [ CrossRef ] [ Google Scholar ]
  • Ferron J, Ware W. Analyzing single-case data: The power of randomization tests. The Journal of Experimental Education. 1995; 63 :167–178. [ Google Scholar ]
  • Fox J. TEACHER’S CORNER: Structural equation modeling with the sem package in R. Structural Equation Modeling: A Multidisciplinary Journal. 2006; 13 (3):465–486. doi: 10.1207/s15328007sem1303_7. [ CrossRef ] [ Google Scholar ]
  • Franklin RD, Allison DB, Gorman BS, editors. Design and analysis of single-case research. Mahwah, NJ: Lawrence Erlbaum Associates; 1997. [ Google Scholar ]
  • Franklin RD, Gorman BS, Beasley TM, Allison DB. Graphical display and visual analysis. In: Franklin RD, Allison DB, Gorman BS, editors. Design and analysis of single-case research. Mahway, NJ: Lawrence Erlbaum Associates, Publishers; 1997. pp. 119–158. [ Google Scholar ]
  • Gardner W, Mulvey EP, Shaw EC. Regression analyses of counts and rates: Poisson, overdispersed Poisson, and negative binomial models. Psychological Bulletin. 1995; 118 (3):392–404. doi: 10.1037/0033-2909.118.3.392. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Green AS, Rafaeli E, Bolger N, Shrout PE, Reis HT. Paper or plastic? Data equivalence in paper and electronic diaries. Psychological Methods. 2006; 11 (1):87–105. doi: 10.1037/1082-989X.11.1.87. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hamilton JD. Time series analysis. Princeton, NJ: Princeton University Press; 1994. [ Google Scholar ]
  • Hammond D, Gast DL. Descriptive analysis of single-subject research designs: 1983–2007. Education and Training in Autism and Developmental Disabilities. 2010; 45 :187–202. [ Google Scholar ]
  • Hanson MD, Chen E. Daily stress, cortisol, and sleep: The moderating role of childhood psychosocial environments. Health Psychology. 2010; 29 (4):394–402. doi: 10.1037/a0019879. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Harvey AC. Forecasting, structural time series models and the Kalman filter. Cambridge, MA: Cambridge University Press; 2001. [ Google Scholar ]
  • Horner RH, Carr EG, Halle J, McGee G, Odom S, Wolery M. The use of single-subject research to identify evidence-based practice in special education. Exceptional Children. 2005; 71 :165–179. [ Google Scholar ]
  • Horner RH, Spaulding S. Single-case research designs. In: Salkind NJ, editor. Encyclopedia of research design. Thousand Oaks, CA: Sage Publications; 2010. [ Google Scholar ]
  • Horton NJ, Kleinman KP. Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. The American Statistician. 2007; 61 (1):79–90. doi: 10.1198/000313007X172556. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hser Y, Shen H, Chou C, Messer SC, Anglin MD. Analytic approaches for assessing long-term treatment effects. Evaluation Review. 2001; 25 (2):233–262. doi: 10.1177/0193841X0102500206. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Huitema BE. Autocorrelation in applied behavior analysis: A myth. Behavioral Assessment. 1985; 7 (2):107–118. [ Google Scholar ]
  • Huitema BE, McKean JW. Reduced bias autocorrelation estimation: Three jackknife methods. Educational and Psychological Measurement. 1994; 54 (3):654–665. doi: 10.1177/0013164494054003008. [ CrossRef ] [ Google Scholar ]
  • Ibrahim JG, Chen M-H, Lipsitz SR, Herring AH. Missing-data methods for generalized linear models: A comparative review. Journal of the American Statistical Association. 2005; 100 (469):332–346. doi: 10.1198/016214504000001844. [ CrossRef ] [ Google Scholar ]
  • Institute of Medicine. Reducing risks for mental disorders: Frontiers for preventive intervention research. Washington, DC: National Academy Press; 1994. [ PubMed ] [ Google Scholar ]
  • Jacobsen NS, Christensen A. Studying the effectiveness of psychotherapy: How well can clinical trials do the job? American Psychologist. 1996; 51 :1031–1039. doi: 10.1037/0003-066X.51.10.1031. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jones RR, Vaught RS, Weinrott MR. Time-series analysis in operant research. Journal of Behavior Analysis. 1977; 10 (1):151–166. doi: 10.1901/jaba.1977.10-151. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jones WP. Single-case time series with Bayesian analysis: A practitioner’s guide. Measurement and Evaluation in Counseling and Development. 2003; 36 (28–39) [ Google Scholar ]
  • Kanfer H. Self-monitoring: Methodological limitations and clinical applications. Journal of Consulting and Clinical Psychology. 1970; 35 (2):148–152. doi: 10.1037/h0029874. [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Drawing valid inferences from case studies. Journal of Consulting and Clinical Psychology. 1981; 49 (2):183–192. doi: 10.1037/0022-006X.49.2.183. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Mediators and mechanisms of change in psychotherapy research. Annual Review of Clinical Psychology. 2007; 3 :1–27. doi: 10.1146/annurev.clinpsy.3.022806.091432. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Evidence-based treatment and practice: New opportunities to bridge clinical research and practice, enhance the knowledge base, and improve patient care. American Psychologist. 2008; 63 (3):146–159. doi: 10.1037/0003-066X.63.3.146. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Understanding how and why psychotherapy leads to change. Psychotherapy Research. 2009; 19 (4):418–428. doi: 10.1080/10503300802448899. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Single-case research designs: Methods for clinical and applied settings. 2. New York, NY: Oxford University Press; 2010. [ Google Scholar ]
  • Kirk RE. Practical significance: A concept whose time has come. Educational and Psychological Measurement. 1996; 56 :746–759. doi: 10.1177/0013164496056005002. [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR. Preparing psychologists for evidence-based school practice: Lessons learned and challenges ahead. American Psychologist. 2007; 62 :829–843. doi: 10.1037/0003-066X.62.8.829. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR, Hitchcock J, Horner RH, Levin JR, Odom SL, Rindskopf DM, Shadish WR. Single-case designs technical documentation. 2010 Retrieved from What Works Clearinghouse website: http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf . Retrieved from http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf .
  • Kratochwill TR, Levin JR. Single-case research design and analysis: New directions for psychology and education. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc; 1992. [ Google Scholar ]
  • Kratochwill TR, Levin JR. Enhancing the scientific credibility of single-case intervention research: Randomization to the rescue. Psychological Methods. 2010; 15 (2):124–144. doi: 10.1037/a0017736. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR, Levin JR, Horner RH, Swoboda C. Visual analysis of single-case intervention research: Conceptual and methodological considerations (WCER Working Paper No. 2011-6) 2011 Retrieved from University of Wisconsin–Madison, Wisconsin Center for Education Research website: http://www.wcer.wisc.edu/publications/workingPapers/papers.php .
  • Lambert D. Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics. 1992; 34 (1):1–14. [ Google Scholar ]
  • Lambert MJ, Hansen NB, Harmon SC. Developing and Delivering Practice-Based Evidence. John Wiley & Sons, Ltd; 2010. Outcome Questionnaire System (The OQ System): Development and practical applications in healthcare settings; pp. 139–154. [ Google Scholar ]
  • Littell JH, Corcoran J, Pillai VK. Systematic reviews and meta-analysis. New York: Oxford University Press; 2008. [ Google Scholar ]
  • Liu LM, Hudack GB. The SCA statistical system. Vector ARMA modeling of multiple time series. Oak Brook, IL: Scientific Computing Associates Corporation; 1995. [ Google Scholar ]
  • Lubke GH, Muthén BO. Investigating population heterogeneity with factor mixture models. Psychological Methods. 2005; 10 (1):21–39. doi: 10.1037/1082-989x.10.1.21. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Manolov R, Solanas A. Comparing N = 1 effect sizes in presence of autocorrelation. Behavior Modification. 2008; 32 (6):860–875. doi: 10.1177/0145445508318866. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Marshall RJ. Autocorrelation estimation of time series with randomly missing observations. Biometrika. 1980; 67 (3):567–570. doi: 10.1093/biomet/67.3.567. [ CrossRef ] [ Google Scholar ]
  • Matyas TA, Greenwood KM. Visual analysis of single-case time series: Effects of variability, serial dependence, and magnitude of intervention effects. Journal of Applied Behavior Analysis. 1990; 23 (3):341–351. doi: 10.1901/jaba.1990.23-341. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR, Chair Members of the Task Force on Evidence-Based Interventions in School Psychology. Procedural and coding manual for review of evidence-based interventions. 2003 Retrieved July 18, 2011 from http://www.sp-ebi.org/documents/_workingfiles/EBImanual1.pdf .
  • Moher D, Schulz KF, Altman DF the CONSORT Group. The CONSORT statement: Revised recommendations for improving the quality of reports of parallel-group randomized trials. Journal of the American Medical Association. 2001; 285 :1987–1991. doi: 10.1001/jama.285.15.1987. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Morgan DL, Morgan RK. Single-participant research design: Bringing science to managed care. American Psychologist. 2001; 56 (2):119–127. doi: 10.1037/0003-066X.56.2.119. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Muthén BO, Curran PJ. General longitudinal modeling of individual differences in experimental designs: A latent variable framework for analysis and power estimation. Psychological Methods. 1997; 2 (4):371–402. doi: 10.1037/1082-989x.2.4.371. [ CrossRef ] [ Google Scholar ]
  • Muthén LK, Muthén BO. Mplus (Version 6.11) Los Angeles, CA: Muthén & Muthén; 2010. [ Google Scholar ]
  • Nagin DS. Analyzing developmental trajectories: A semiparametric, group-based approach. Psychological Methods. 1999; 4 (2):139–157. doi: 10.1037/1082-989x.4.2.139. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • National Institute of Child Health and Human Development. Report of the National Reading Panel. Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction (NIH Publication No. 00-4769) Washington, DC: U.S. Government Printing Office; 2000. [ Google Scholar ]
  • Olive ML, Smith BW. Effect size calculations and single subject designs. Educational Psychology. 2005; 25 (2–3):313–324. doi: 10.1080/0144341042000301238. [ CrossRef ] [ Google Scholar ]
  • Oslin DW, Cary M, Slaymaker V, Colleran C, Blow FC. Daily ratings measures of alcohol craving during an inpatient stay define subtypes of alcohol addiction that predict subsequent risk for resumption of drinking. Drug and Alcohol Dependence. 2009; 103 (3):131–136. doi: 10.1016/J.Drugalcdep.2009.03.009. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Palermo TP, Valenzuela D, Stork PP. A randomized trial of electronic versus paper pain diaries in children: Impact on compliance, accuracy, and acceptability. Pain. 2004; 107 (3):213–219. doi: 10.1016/j.pain.2003.10.005. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Parker RI, Brossart DF. Evaluating single-case research data: A comparison of seven statistical methods. Behavior Therapy. 2003; 34 (2):189–211. doi: 10.1016/S0005-7894(03)80013-8. [ CrossRef ] [ Google Scholar ]
  • Parker RI, Cryer J, Byrns G. Controlling baseline trend in single case research. School Psychology Quarterly. 2006; 21 (4):418–440. doi: 10.1037/h0084131. [ CrossRef ] [ Google Scholar ]
  • Parker RI, Vannest K. An improved effect size for single-case research: Nonoverlap of all pairs. Behavior Therapy. 2009; 40 (4):357–367. doi: 10.1016/j.beth.2008.10.006. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Parsonson BS, Baer DM. The analysis and presentation of graphic data. In: Kratochwill TR, editor. Single subject research. New York, NY: Academic Press; 1978. pp. 101–166. [ Google Scholar ]
  • Parsonson BS, Baer DM. The visual analysis of data, and current research into the stimuli controlling it. In: Kratochwill TR, Levin JR, editors. Single-case research design and analysis: New directions for psychology and education. Hillsdale, NJ; England: Lawrence Erlbaum Associates, Inc; 1992. pp. 15–40. [ Google Scholar ]
  • Piasecki TM, Hufford MR, Solham M, Trull TJ. Assessing clients in their natural environments with electronic diaries: Rationale, benefits, limitations, and barriers. Psychological Assessment. 2007; 19 (1):25–43. doi: 10.1037/1040-3590.19.1.25. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2005. [ Google Scholar ]
  • Ragunathan TE. What do we do with missing data? Some options for analysis of incomplete data. Annual Review of Public Health. 2004; 25 :99–117. doi: 10.1146/annurev.publhealth.25.102802.124410. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Raudenbush SW, Bryk AS, Congdon R. HLM 7 Hierarchical Linear and Nonlinear Modeling. Scientific Software International, Inc; 2011. [ Google Scholar ]
  • Redelmeier DA, Kahneman D. Patients’ memories of painful medical treatments: Real-time and retrospective evaluations of two minimally invasive procedures. Pain. 1996; 66 (1):3–8. doi: 10.1016/0304-3959(96)02994-6. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Reis HT. Domains of experience: Investigating relationship processes from three perspectives. In: Erber R, Gilmore R, editors. Theoretical frameworks in personal relationships. Mahwah, NJ: Erlbaum; 1994. pp. 87–110. [ Google Scholar ]
  • Reis HT, Gable SL. Event sampling and other methods for studying everyday experience. In: Reis HT, Judd CM, editors. Handbook of research methods in social and personality psychology. New York, NY: Cambridge University Press; 2000. pp. 190–222. [ Google Scholar ]
  • Robey RR, Schultz MC, Crawford AB, Sinner CA. Single-subject clinical-outcome research: Designs, data, effect sizes, and analyses. Aphasiology. 1999; 13 (6):445–473. doi: 10.1080/026870399402028. [ CrossRef ] [ Google Scholar ]
  • Rossi PH, Freeman HE. Evaluation: A systematic approach. 5. Thousand Oaks, CA: Sage; 1993. [ Google Scholar ]
  • SAS Institute Inc. The SAS system for Windows, Version 9. Cary, NC: SAS Institute Inc; 2008. [ Google Scholar ]
  • Schmidt M, Perels F, Schmitz B. How to perform idiographic and a combination of idiographic and nomothetic approaches: A comparison of time series analyses and hierarchical linear modeling. Journal of Psychology. 2010; 218 (3):166–174. doi: 10.1027/0044-3409/a000026. [ CrossRef ] [ Google Scholar ]
  • Scollon CN, Kim-Pietro C, Diener E. Experience sampling: Promises and pitfalls, strengths and weaknesses. Assessing Well-Being. 2003; 4 :5–35. doi: 10.1007/978-90-481-2354-4_8. [ CrossRef ] [ Google Scholar ]
  • Scruggs TE, Mastropieri MA. Summarizing single-subject research: Issues and applications. Behavior Modification. 1998; 22 (3):221–242. doi: 10.1177/01454455980223001. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Scruggs TE, Mastropieri MA, Casto G. The quantitative synthesis of single-subject research. Remedial and Special Education. 1987; 8 (2):24–33. doi: 10.1177/074193258700800206. [ CrossRef ] [ Google Scholar ]
  • Shadish WR, Cook TD, Campbell DT. Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin; 2002. [ Google Scholar ]
  • Shadish WR, Rindskopf DM, Hedges LV. The state of the science in the meta-analysis of single-case experimental designs. Evidence-Based Communication Assessment and Intervention. 2008; 3 :188–196. doi: 10.1080/17489530802581603. [ CrossRef ] [ Google Scholar ]
  • Shadish WR, Sullivan KJ. Characteristics of single-case designs used to assess treatment effects in 2008. Behavior Research Methods. 2011; 43 :971–980. doi: 10.3758/s13428-011-0111-y. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sharpley CF. Time-series analysis of behavioural data: An update. Behaviour Change. 1987; 4 :40–45. [ Google Scholar ]
  • Shiffman S, Hufford M, Hickcox M, Paty JA, Gnys M, Kassel JD. Remember that? A comparison of real-time versus retrospective recall of smoking lapses. Journal of Consulting and Clinical Psychology. 1997; 65 :292–300. doi: 10.1037/0022-006X.65.2.292.a. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shiffman S, Stone AA. Ecological momentary assessment: A new tool for behavioral medicine research. In: Krantz DS, Baum A, editors. Technology and methods in behavioral medicine. Mahwah, NJ: Erlbaum; 1998. pp. 117–131. [ Google Scholar ]
  • Shiffman S, Stone AA, Hufford MR. Ecological momentary assessment. Annual Review of Clinical Psychology. 2008; 4 :1–32. doi: 10.1146/annurev.clinpsy.3.022806.091415. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shumway RH, Stoffer DS. An approach to time series smoothing and forecasting using the EM Algorithm. Journal of Time Series Analysis. 1982; 3 (4):253–264. doi: 10.1111/j.1467-9892.1982.tb00349.x. [ CrossRef ] [ Google Scholar ]
  • Skinner BF. The behavior of organisms. New York, NY: Appleton-Century-Crofts; 1938. [ Google Scholar ]
  • Smith JD, Borckardt JJ, Nash MR. Inferential precision in single-case time-series datastreams: How well does the EM Procedure perform when missing observations occur in autocorrelated data? Behavior Therapy. doi: 10.1016/j.beth.2011.10.001. (in press) [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Smith JD, Handler L, Nash MR. Therapeutic Assessment for preadolescent boys with oppositional-defiant disorder: A replicated single-case time-series design. Psychological Assessment. 2010; 22 (3):593–602. doi: 10.1037/a0019697. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Snijders TAB, Bosker RJ. Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage; 1999. [ Google Scholar ]
  • Soliday E, Moore KJ, Lande MB. Daily reports and pooled time series analysis: Pediatric psychology applications. Journal of Pediatric Psychology. 2002; 27 (1):67–76. doi: 10.1093/jpepsy/27.1.67. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • SPSS Statistics. Chicago, IL: SPSS Inc; 2011. (Version 20.0.0) [ Google Scholar ]
  • StataCorp. Stata Statistical Software: Release 12. College Station, TX: StataCorp LP; 2011. [ Google Scholar ]
  • Stone AA, Broderick JE, Kaell AT, Deles-Paul PAEG, Porter LE. Does the peak-end phenomenon observed in laboratory pain studies apply to real-world pain in rheumatoid arthritics? Journal of Pain. 2000; 1 :212–217. doi: 10.1054/jpai.2000.7568. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Stone AA, Shiffman S. Capturing momentary, self-report data: A proposal for reporting guidelines. Annals of Behavioral Medicine. 2002; 24 :236–243. doi: 10.1207/S15324796ABM2403_09. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Stout RL. Advancing the analysis of treatment process. Addiction. 2007; 102 :1539–1545. doi: 10.1111/j.1360-0443.2007.01880.x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tate RL, McDonald S, Perdices M, Togher L, Schultz R, Savage S. Rating the methodological quality of single-subject designs and N-of-1 trials: Introducing the Single-Case Experimental Design (SCED) Scale. Neuropsychological Rehabilitation. 2008; 18 (4):385–401. doi: 10.1080/09602010802009201. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Thiele C, Laireiter A-R, Baumann U. Diaries in clinical psychology and psychotherapy: A selective review. Clinical Psychology & Psychotherapy. 2002; 9 (1):1–37. doi: 10.1002/cpp.302. [ CrossRef ] [ Google Scholar ]
  • Tiao GC, Box GEP. Modeling multiple time series with applications. Journal of the American Statistical Association. 1981; 76 :802–816. [ Google Scholar ]
  • Tschacher W, Ramseyer F. Modeling psychotherapy process by time-series panel analysis (TSPA) Psychotherapy Research. 2009; 19 (4):469–481. doi: 10.1080/10503300802654496. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Velicer WF, Colby SM. A comparison of missing-data procedures for ARIMA time-series analysis. Educational and Psychological Measurement. 2005a; 65 (4):596–615. doi: 10.1177/0013164404272502. [ CrossRef ] [ Google Scholar ]
  • Velicer WF, Colby SM. Missing data and the general transformation approach to time series analysis. In: Maydeu-Olivares A, McArdle JJ, editors. Contemporary psychometrics. A festschrift to Roderick P McDonald. Hillsdale, NJ: Lawrence Erlbaum; 2005b. pp. 509–535. [ Google Scholar ]
  • Velicer WF, Fava JL. Time series analysis. In: Schinka J, Velicer WF, Weiner IB, editors. Research methods in psychology. Vol. 2. New York, NY: John Wiley & Sons; 2003. [ Google Scholar ]
  • Wachtel PL. Beyond “ESTs”: Problematic assumptions in the pursuit of evidence-based practice. Psychoanalytic Psychology. 2010; 27 (3):251–272. doi: 10.1037/a0020532. [ CrossRef ] [ Google Scholar ]
  • Watson JB. Behaviorism. New York, NY: Norton; 1925. [ Google Scholar ]
  • Weisz JR, Hawley KM. Finding, evaluating, refining, and applying empirically supported treatments for children and adolescents. Journal of Clinical Child Psychology. 1998; 27 :206–216. doi: 10.1207/s15374424jccp2702_7. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Weisz JR, Hawley KM. Procedural and coding manual for identification of beneficial treatments. Washinton, DC: American Psychological Association, Society for Clinical Psychology, Division 12, Committee on Science and Practice; 1999. [ Google Scholar ]
  • Westen D, Bradley R. Empirically supported complexity. Current Directions in Psychological Science. 2005; 14 :266–271. doi: 10.1111/j.0963-7214.2005.00378.x. [ CrossRef ] [ Google Scholar ]
  • Westen D, Novotny CM, Thompson-Brenner HK. The empirical status of empirically supported psychotherapies: Assumptions, findings, and reporting controlled clinical trials. Psychological Bulletin. 2004; 130 :631–663. doi: 10.1037/0033-2909.130.4.631. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wilkinson L The Task Force on Statistical Inference. Statistical methods in psychology journals: Guidelines and explanations. American Psychologist. 1999; 54 :694–704. doi: 10.1037/0003-066X.54.8.594. [ CrossRef ] [ Google Scholar ]
  • Wolery M, Busick M, Reichow B, Barton EE. Comparison of overlap methods for quantitatively synthesizing single-subject data. The Journal of Special Education. 2010; 44 (1):18–28. doi: 10.1177/0022466908328009. [ CrossRef ] [ Google Scholar ]
  • Wu Z, Huang NE, Long SR, Peng C-K. On the trend, detrending, and variability of nonlinear and nonstationary time series. Proceedings of the National Academy of Sciences. 2007; 104 (38):14889–14894. doi: 10.1073/pnas.0701020104. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Skip to main content
  • Skip to primary sidebar

IResearchNet

Single-Case Experimental Design

Single-case experimental design, a versatile research methodology within psychology, holds particular significance in the field of school psychology. This article provides an overview of single-case experimental design, covering its definition, historical development, and key concepts. It delves into various types of single-case designs, including AB, ABA, and Multiple Baseline designs, illustrating their applications within school psychology. The article also explores data collection, analysis methods, and common challenges associated with this methodology. By highlighting its value in empirical research, this article underscores the enduring relevance of single-case experimental design in advancing the understanding and practice of school psychology.

Introduction

Single-case experimental design, a research methodology of profound importance in the realm of psychology, is characterized by its unique approach to investigating behavioral and psychological phenomena. Within this article, we will embark on a journey to explore the intricate facets of this research methodology and unravel its multifaceted applications, with a particular focus on its relevance in school psychology.

Single-case experimental design, often referred to as “N of 1” research, is a methodology that centers on the in-depth examination of individual subjects or cases. Unlike traditional group-based designs, this approach allows researchers to closely study and understand the nuances of a single participant’s behavior, responses, and reactions over time. The precision and depth of insight offered by single-case experimental design have made it an invaluable tool in the field of psychology, facilitating both clinical and experimental research endeavors.

One of the most compelling aspects of this research methodology lies in its applicability to school psychology. In educational settings, understanding the unique needs and challenges of individual students is paramount, and single-case experimental design offers a tailored and systematic way to address these issues. Whether it involves assessing the effectiveness of an intervention for a specific learning disability or studying the impact of a behavior modification program for a student with special needs, single-case experimental design equips school psychologists with a powerful tool to make data-driven decisions and individualized educational plans.

Throughout this article, we will delve into the foundations of single-case experimental design, exploring its historical evolution, key concepts, and core terminology. We will also discuss the various types of single-case designs, including AB, ABA, and Multiple Baseline designs, illustrating their practical applications within the context of school psychology. Furthermore, the article will shed light on the data collection methods and the statistical techniques used for analysis, as well as the ethical considerations and challenges that researchers encounter in single-case experiments.

In sum, this article aims to provide an in-depth understanding of single-case experimental design and its pivotal role in advancing knowledge in psychology, particularly within the field of school psychology. As we embark on this exploration, it is evident that single-case experimental design serves as a bridge between rigorous scientific inquiry and the real-world needs of individuals, making it an indispensable asset in enhancing the quality of psychological research and practice.

Understanding Single-Case Experimental Design

Single-Case Experimental Design (SCED), often referred to as “N of 1” research, is a research methodology employed in psychology to investigate behavioral and psychological phenomena with an emphasis on the individual subject as the primary unit of analysis. The primary purpose of SCED is to meticulously study the behavior, responses, and changes within a single participant over time. Unlike traditional group-based research, SCED is tailored to the unique characteristics and needs of individual cases, enabling a more in-depth understanding of the variables under investigation.

The historical background of SCED can be traced back to the early 20th century when researchers like B.F. Skinner pioneered the development of operant conditioning and experimental analysis of behavior. Skinner’s work laid the groundwork for single-case experiments by emphasizing the importance of understanding the functional relations between behavior and environmental variables. Over the decades, SCED has evolved and gained prominence in various fields within psychology, notably in clinical and school psychology. Its relevance in school psychology is particularly noteworthy, as it offers a systematic and data-driven approach to address the diverse learning and behavioral needs of students. School psychologists use SCED to design and assess individualized interventions, evaluate the effectiveness of specific teaching strategies, and make informed decisions about special education programs.

Understanding SCED involves familiarity with key concepts and terminology that underpin the methodology. These terms include:

  • Baseline: The initial phase of data collection where the participant’s behavior is measured before any intervention is introduced. Baseline data serve as a point of reference for assessing the impact of subsequent interventions.
  • Intervention: The phase in which a specific treatment, manipulation, or condition is introduced to the participant. The goal of the intervention is to bring about a change in the target behavior.
  • Dependent Variables: These are the behaviors or responses under investigation. They are the outcomes that researchers aim to measure and analyze for changes across different phases of the experiment.

Reliability and validity are critical considerations in SCED. Reliability refers to the consistency and stability of measurement. In SCED, it is crucial to ensure that data collection procedures are reliable, as any variability can affect the interpretation of results. Validity pertains to the accuracy and truthfulness of the data. Researchers must establish that the dependent variable measurements are valid and accurately reflect the behavior of interest. When these principles are applied in SCED, it enhances the scientific rigor and credibility of the research findings, which is essential in both clinical and school psychology contexts.

This foundation of key concepts and terminology serves as the basis for designing, conducting, and interpreting single-case experiments, ensuring that the methodology maintains high standards of precision and integrity in the pursuit of understanding individual behavior and psychological processes.

Types of Single-Case Experimental Designs

The AB Design is one of the fundamental single-case experimental designs, characterized by its simplicity and effectiveness. In an AB Design, the researcher observes and measures a single subject’s behavior during two distinct phases: the baseline (A) phase and the intervention (B) phase. During the baseline phase, the researcher collects data on the subject’s behavior without any intervention or treatment. This baseline data serve as a reference point to understand the natural or typical behavior of the individual. Following the baseline phase, the intervention or treatment is introduced, and data on the subject’s behavior are collected again. The AB Design allows for the comparison of baseline data with intervention data, enabling researchers to determine whether the introduced intervention had a noticeable impact on the individual’s behavior.

AB Designs find extensive application in school psychology. For instance, consider a scenario where a school psychologist wishes to assess the effectiveness of a time-management training program for a student with attention deficit hyperactivity disorder (ADHD). During the baseline phase, the psychologist observes the student’s on-task behavior in the absence of any specific time-management training. Subsequently, during the intervention phase, the psychologist implements the time-management program and measures the student’s on-task behavior again. By comparing the baseline and intervention data, the psychologist can evaluate the program’s efficacy in improving the student’s behavior.

The ABA Design is another prominent single-case experimental design characterized by the inclusion of a reversal (A) phase. In this design, the researcher initially collects baseline data (Phase A), introduces the intervention (Phase B), and then returns to the baseline conditions (Phase A). The ABA Design is significant because it provides an opportunity to assess the reversibility of the effects of the intervention. If the behavior returns to baseline levels during the second A phase, it suggests a strong relationship between the intervention and the observed changes in behavior.

In school psychology, the ABA Design offers valuable insights into the effectiveness of interventions for students with diverse needs. For instance, a school psychologist may use the ABA Design to evaluate a behavior modification program for a student with autism spectrum disorder (ASD). During the first baseline phase (A), the psychologist observes the student’s behavior patterns. Subsequently, in the intervention phase (B), a behavior modification program is implemented. If the student’s behavior shows positive changes, this suggests that the program is effective. Finally, during the second baseline phase (A), the psychologist can determine if the changes are reversible, which informs decisions regarding the program’s ongoing use or modification.

The Multiple Baseline Design is a versatile single-case experimental design that addresses challenges such as ethical concerns or logistical constraints that might limit the use of reversal designs. In this design, researchers stagger the introduction of the intervention across multiple behaviors, settings, or individuals. Each baseline and intervention phase is implemented at different times for each behavior, allowing researchers to establish a cause-and-effect relationship by demonstrating that the intervention corresponds with changes in the specific behavior under investigation.

Within school psychology, Multiple Baseline Designs offer particular utility when assessing interventions for students in complex or sensitive situations. For example, a school psychologist working with a student who displays challenging behaviors may choose to implement a Multiple Baseline Design. The psychologist can introduce a behavior intervention plan (BIP) for different target behaviors, such as aggression, noncompliance, and self-injury, at different times. By measuring and analyzing changes in behavior across these multiple behaviors, the psychologist can assess the effectiveness of the BIP and make informed decisions about its implementation across various behavioral concerns. This design is particularly valuable when ethical considerations prevent the reversal of an effective intervention, as it allows researchers to demonstrate the intervention’s impact without removing a beneficial treatment.

Conducting and Analyzing Single-Case Experiments

In single-case experiments, data collection and measurement are pivotal components that underpin the scientific rigor of the research. Data are typically collected through direct observation, self-reports, or the use of various measuring instruments, depending on the specific behavior or variable under investigation. To ensure reliability and validity, researchers meticulously define and operationalize the target behavior, specifying how it will be measured. This may involve the use of checklists, rating scales, video recordings, or other data collection tools. In school psychology research, systematic data collection is imperative to make informed decisions about interventions and individualized education plans (IEPs). It provides school psychologists with empirical evidence to track the progress of students, assess the effectiveness of interventions, and adapt strategies based on the collected data.

Visual analysis is a core element of interpreting data in single-case experiments. Researchers plot the data in graphs, creating visual representations of the behavior across different phases. By visually inspecting the data, researchers can identify patterns, trends, and changes in behavior. Visual analysis is particularly well-suited for detecting whether an intervention has had a noticeable effect.

In addition to visual analysis, statistical methods are occasionally employed in single-case experiments to enhance the rigor of analysis. These methods include effect size calculations and phase change calculations. Effect size measures, such as Cohen’s d or Tau-U, quantify the magnitude of change between the baseline and intervention phases, providing a quantitative understanding of the treatment’s impact. Phase change calculations determine the statistical significance of behavior change across different phases, aiding in the determination of whether the intervention had a meaningful effect.

Visual analysis and statistical methods complement each other, enabling researchers in school psychology to draw more robust conclusions about the efficacy of interventions. These methods are valuable in making data-driven decisions regarding students’ educational and behavioral progress.

Single-case experimental designs are not without their challenges and limitations. Researchers must grapple with issues such as the potential for confounding variables, limited generalizability to other cases, and the need for careful control of extraneous factors. In school psychology, these challenges are compounded by the dynamic and diverse nature of educational settings, making it essential for researchers to adapt the methodology to specific contexts and populations.

Moreover, ethical considerations loom large in school psychology research. Researchers must adhere to strict ethical guidelines when conducting single-case experiments involving students. Informed consent, confidentiality, and the well-being of the participants are paramount. Ethical considerations are especially critical when conducting research with vulnerable populations, such as students with disabilities or those in special education programs. The ethical conduct of research in school psychology is pivotal to maintaining trust and ensuring the welfare of students and their families.

In conclusion, the application of single-case experimental design in school psychology research is a powerful approach for addressing individualized educational and behavioral needs. By emphasizing systematic data collection, employing visual analysis and statistical methods, and navigating the inherent challenges and ethical considerations, researchers can contribute to the advancement of knowledge in this field while ensuring the well-being and progress of the students they serve.

In conclusion, this article has provided a comprehensive exploration of Single-Case Experimental Design (SCED) and its vital role within the domain of school psychology. Key takeaways from this article underscore the significance of SCED as a versatile and invaluable research methodology:

First and foremost, SCED is a methodological cornerstone for investigating individual behavior and psychological phenomena. Through meticulous observation and data collection, it enables researchers to gain deep insights into the idiosyncratic needs and responses of students in educational settings.

The significance of SCED in school psychology is pronounced. It empowers school psychologists to design and assess tailored interventions, evaluate the effectiveness of educational programs, and make data-driven decisions that enhance the quality of education for students with diverse needs. Whether it’s tracking progress, assessing the efficacy of behavioral interventions, or individualizing education plans, SCED plays an instrumental role in achieving these goals.

Furthermore, the article has illuminated three primary types of single-case experimental designs: AB, ABA, and Multiple Baseline. These designs offer the flexibility to investigate the effects of interventions and assess their reversibility when required. Such methods have a direct and tangible impact on the daily practices of school psychologists, allowing them to optimize support and educational strategies.

The importance of systematic data collection and measurement, the role of visual analysis and statistical methods in data interpretation, and the acknowledgment of ethical considerations in school psychology research have been underscored. These aspects collectively serve as the foundation of SCED, ensuring the integrity and reliability of research outcomes.

As we look toward the future, the potential developments in SCED are promising. Advances in technology, such as wearable devices and digital data collection tools, offer new possibilities for precise and efficient data gathering. Additionally, the integration of SCED with other research methodologies, such as mixed-methods research, holds the potential to provide a more comprehensive understanding of students’ educational experiences.

In summary, Single-Case Experimental Design is a pivotal research methodology that bridges the gap between rigorous scientific inquiry and the real-world needs of students in school psychology. Its power lies in its capacity to assess, refine, and individualize interventions and educational plans. The continued application and refinement of SCED in school psychology research promise to contribute significantly to the advancement of knowledge and the enhancement of educational outcomes for students of all backgrounds and abilities. As we move forward, the integration of SCED with emerging technologies and research paradigms will continue to shape the landscape of school psychology research, leading to more effective and tailored interventions for the benefit of students and the field as a whole.

References:

  • Barlow, D. H., & Nock, M. K. (2009). Why can’t we be more idiographic in our research? Perspectives on Psychological Science, 4(1), 19-21.
  • Cook, B. G., & Schirmer, B. R. (2003). What is N of 1 research? Exceptionality, 11(1), 65-76.
  • Cooper, J. O., Heron, T. E., & Heward, W. L. (2020). Applied behavior analysis (3rd ed.). Pearson.
  • Kazdin, A. E. (1982). Single-case research designs: Methods for clinical and applied settings. Oxford University Press.
  • Kratochwill, T. R., Hitchcock, J. H., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M., & Shadish, W. R. (2010). Single-case intervention research design standards. Remedial and Special Education, 31(3), 205-214.
  • Levin, J. R., Ferron, J. M., Kratochwill, T. R., Forster, J. L., Rodgers, M. S., Maczuga, S. A., & Chinn, S. (2016). A randomized controlled trial evaluation of a research synthesis and research proposal process aimed at improving graduate students’ research competency. Journal of Educational Psychology, 108(5), 680-701.
  • Morgan, D. L., & Morgan, R. K. (2009). Single-participant research design: Bringing science to managed care. Psychotherapy Research, 19(4-5), 577-587.
  • Ottenbacher, K. J., & Maas, F. (1999). The effect of statistical methodology on the single subject design: An empirical investigation. Journal of Behavioral Education, 9(2), 111-130.
  • Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin.
  • Sidman, M. (1960). Tactics of scientific research: Evaluating experimental data in psychology. Basic Books.
  • Vannest, K. J., Parker, R. I., Gonen, O., Adigüzel, T., & Bovaird, J. A. (2016). Single case research: web-based calculators for SCR analysis. Behavior Research Methods, 48(1), 97-103.
  • Wilczynski, S. M., & Christian, L. (2008). Applying single-subject design for students with disabilities in inclusive settings. Pearson.
  • Wong, C., Odom, S. L., Hume, K. A., Cox, A. W., Fettig, A., Kucharczyk, S., & Schultz, T. R. (2015). Evidence-based practices for children, youth, and young adults with autism spectrum disorder: A comprehensive review. Journal of Autism and Developmental Disorders, 45(7), 1951-1966.
  • Kratochwill, T. R., & Levin, J. R. (2018). Single-case research design and analysis: New directions for psychology and education. Routledge.
  • Hall, R. V., & Fox, L. (2015). The need for N of 1 research in special education. Exceptionality, 23(4), 225-233.
  • Shadish, W. R., & Sullivan, K. J. (2011). Characteristics of single-case designs used to assess intervention effects in 2008. Behavior Research Methods, 43(4), 971-980.
  • Campbell, D. T., & Stanley, J. C. (2015). Experimental and quasi-experimental designs for research. Ravenio Books.
  • Kazdin, A. E. (2011). Single-case research designs: Methods for clinical and applied settings (2nd ed.). Oxford University Press.
  • Therrien, W. J., & Bulawski, J. (2019). The use of single-case experimental designs in school psychology research: A systematic review. Journal of School Psychology, 73, 92-112.
  • Gavidia-Payne, S., Little, E., & Schell, G. (2018). Single-case experimental design: Applications in developmental and behavioral science. Routledge.

Our websites may use cookies to personalize and enhance your experience. By continuing without changing your cookie settings, you agree to this collection. For more information, please see our University Websites Privacy Notice .

Neag School of Education

Educational Research Basics by Del Siegle

Single subject research.

“ Single subject research (also known as single case experiments) is popular in the fields of special education and counseling. This research design is useful when the researcher is attempting to change the behavior of an individual or a small group of individuals and wishes to document that change. Unlike true experiments where the researcher randomly assigns participants to a control and treatment group, in single subject research the participant serves as both the control and treatment group. The researcher uses line graphs to show the effects of a particular intervention or treatment.  An important factor of single subject research is that only one variable is changed at a time. Single subject research designs are “weak when it comes to external validity….Studies involving single-subject designs that show a particular treatment to be effective in changing behavior must rely on replication–across individuals rather than groups–if such results are be found worthy of generalization” (Fraenkel & Wallen, 2006, p. 318).

Suppose a researcher wished to investigate the effect of praise on reducing disruptive behavior over many days. First she would need to establish a baseline of how frequently the disruptions occurred. She would measure how many disruptions occurred each day for several days. In the example below, the target student was disruptive seven times on the first day, six times on the second day, and seven times on the third day. Note how the sequence of time is depicted on the x-axis (horizontal axis) and the dependent variable (outcome variable) is depicted on the y-axis (vertical axis).

image002

Once a baseline of behavior has been established (when a consistent pattern emerges with at least three data points), the intervention begins. The researcher continues to plot the frequency of behavior while implementing the intervention of praise.

image004

In this example, we can see that the frequency of disruptions decreased once praise began. The design in this example is known as an A-B design. The baseline period is referred to as A and the intervention period is identified as B.

image006

Another design is the A-B-A design. An A-B-A design (also known as a reversal design) involves discontinuing the intervention and returning to a nontreatment condition.

image008

Sometimes an individual’s behavior is so severe that the researcher cannot wait to establish a baseline and must begin with an intervention. In this case, a B-A-B design is used. The intervention is implemented immediately (before establishing a baseline). This is followed by a measurement without the intervention and then a repeat of the intervention.

image010

Multiple-Baseline Design

Sometimes, a researcher may be interested in addressing several issues for one student or a single issue for several students. In this case, a multiple-baseline design is used.

“In a multiple baseline across subjects design, the researcher introduces the intervention to different persons at different times. The significance of this is that if a behavior changes only after the intervention is presented, and this behavior change is seen successively in each subject’s data, the effects can more likely be credited to the intervention itself as opposed to other variables. Multiple-baseline designs do not require the intervention to be withdrawn. Instead, each subject’s own data are compared between intervention and nonintervention behaviors, resulting in each subject acting as his or her own control (Kazdin, 1982). An added benefit of this design, and all single-case designs, is the immediacy of the data. Instead of waiting until postintervention to take measures on the behavior, single-case research prescribes continuous data collection and visual monitoring of that data displayed graphically, allowing for immediate instructional decision-making. Students, therefore, do not linger in an intervention that is not working for them, making the graphic display of single-case research combined with differentiated instruction responsive to the needs of students.” (Geisler, Hessler, Gardner, & Lovelace, 2009)

image012

Regardless of the research design, the line graphs used to illustrate the data contain a set of common elements.

image014

Generally, in single subject research we count the number of times something occurs in a given time period and see if it occurs more or less often in that time period after implementing an intervention. For example, we might measure how many baskets someone makes while shooting for 2 minutes. We would repeat that at least three times to get our baseline. Next, we would test some intervention. We might play music while shooting, give encouragement while shooting, or video the person while shooting to see if our intervention influenced the number of shots made. After the 3 baseline measurements (3 sets of 2 minute shooting), we would measure several more times (sets of 2 minute shooting) after the intervention and plot the time points (number of baskets made in 2 minutes for each of the measured time points). This works well for behaviors that are distinct and can be counted.

Sometimes behaviors come and go over time (such as being off task in a classroom or not listening during a coaching session). The way we can record these is to select a period of time (say 5 minutes) and mark down every 10 seconds whether our participant is on task. We make a minimum of three sets of 5 minute observations for a baseline, implement an intervention, and then make more sets of 5 minute observations with the intervention in place. We use this method rather than counting how many times someone is off task because one could continually be off task and that would only be a count of 1 since the person was continually off task. Someone who might be off task twice for 15 second would be off task twice for a score of 2. However, the second person is certainly not off task twice as much as the first person. Therefore, recording whether the person is off task at 10-second intervals gives a more accurate picture. The person continually off task would have a score of 30 (off task at every second interval for 5 minutes) and the person off task twice for a short time would have a score of 2 (off task only during 2 of the 10 second interval measures.

I also have additional information about how to record single-subject research data .

I hope this helps you better understand single subject research.

I have created a PowerPoint on Single Subject Research , which also available below as a video.

I have also created instructions for creating single-subject research design graphs with Excel .

Fraenkel, J. R., & Wallen, N. E. (2006). How to design and evaluate research in education (6th ed.). Boston, MA: McGraw Hill.

Geisler, J. L., Hessler, T., Gardner, R., III, & Lovelace, T. S. (2009). Differentiated writing interventions for high-achieving urban African American elementary students. Journal of Advanced Academics, 20, 214–247.

Del Siegle, Ph.D. University of Connecticut [email protected] www.delsiegle.info

Revised 02/02/2024

unlike the case study the single participant experiment ____

Case Study vs. Single-Case Experimental Designs

What's the difference.

Case study and single-case experimental designs are both research methods used in psychology and other social sciences to investigate individual cases or subjects. However, they differ in their approach and purpose. Case studies involve in-depth examination of a single case, such as an individual, group, or organization, to gain a comprehensive understanding of the phenomenon being studied. On the other hand, single-case experimental designs focus on studying the effects of an intervention or treatment on a single subject over time. These designs use repeated measures and control conditions to establish cause-and-effect relationships. While case studies provide rich qualitative data, single-case experimental designs offer more rigorous experimental control and allow for the evaluation of treatment effectiveness.

Further Detail

Introduction.

When conducting research in various fields, it is essential to choose the appropriate study design to answer research questions effectively. Two commonly used designs are case study and single-case experimental designs. While both approaches aim to provide valuable insights into specific phenomena, they differ in several key attributes. This article will compare and contrast the attributes of case study and single-case experimental designs, highlighting their strengths and limitations.

Definition and Purpose

A case study is an in-depth investigation of a particular individual, group, or event. It involves collecting and analyzing qualitative or quantitative data to gain a comprehensive understanding of the subject under study. Case studies are often used to explore complex phenomena, generate hypotheses, or provide detailed descriptions of unique cases.

On the other hand, single-case experimental designs are a type of research design that focuses on studying a single individual or a small group over time. These designs involve manipulating an independent variable and measuring its effects on a dependent variable. Single-case experimental designs are particularly useful for examining cause-and-effect relationships and evaluating the effectiveness of interventions or treatments.

Data Collection and Analysis

In terms of data collection, case studies rely on various sources such as interviews, observations, documents, and artifacts. Researchers often employ multiple methods to gather rich and diverse data, allowing for a comprehensive analysis of the case. The data collected in case studies are typically qualitative in nature, although quantitative data may also be included.

In contrast, single-case experimental designs primarily rely on quantitative data collection methods. Researchers use standardized measures and instruments to collect data on the dependent variable before, during, and after the manipulation of the independent variable. This allows for a systematic analysis of the effects of the intervention or treatment on the individual or group being studied.

Generalizability

One of the key differences between case studies and single-case experimental designs is their generalizability. Case studies are often conducted on unique or rare cases, making it challenging to generalize the findings to a larger population. The focus of case studies is on providing detailed insights into specific cases rather than making broad generalizations.

On the other hand, single-case experimental designs aim to establish causal relationships and can provide evidence for generalizability. By systematically manipulating the independent variable and measuring its effects on the dependent variable, researchers can draw conclusions about the effectiveness of interventions or treatments that may be applicable to similar cases or populations.

Internal Validity

Internal validity refers to the extent to which a study accurately measures the cause-and-effect relationship between variables. In case studies, establishing internal validity can be challenging due to the lack of control over extraneous variables. The presence of multiple data sources and the potential for subjective interpretation may also introduce bias.

In contrast, single-case experimental designs prioritize internal validity by employing rigorous control over extraneous variables. Researchers carefully design the intervention or treatment, implement it consistently, and measure the dependent variable under controlled conditions. This allows for a more confident determination of the causal relationship between the independent and dependent variables.

Time and Resources

Case studies often require significant time and resources due to their in-depth nature. Researchers need to spend considerable time collecting and analyzing data from various sources, conducting interviews, and immersing themselves in the case. Additionally, case studies may involve multiple researchers or a research team, further increasing the required resources.

On the other hand, single-case experimental designs can be more time and resource-efficient. Since they focus on a single individual or a small group, data collection and analysis can be more streamlined. Researchers can also implement interventions or treatments in a controlled manner, reducing the time and resources needed for data collection.

Ethical Considerations

Both case studies and single-case experimental designs require researchers to consider ethical implications. In case studies, researchers must ensure the privacy and confidentiality of the individuals or groups being studied. Informed consent and ethical guidelines for data collection and analysis should be followed to protect the rights and well-being of the participants.

Similarly, in single-case experimental designs, researchers must consider ethical considerations when implementing interventions or treatments. The well-being and safety of the individual or group being studied should be prioritized, and informed consent should be obtained. Additionally, researchers should carefully monitor and evaluate the potential risks and benefits associated with the intervention or treatment.

Case studies and single-case experimental designs are valuable research approaches that offer unique insights into specific phenomena. While case studies provide in-depth descriptions and exploratory analyses of individual cases, single-case experimental designs focus on establishing causal relationships and evaluating interventions or treatments. Researchers should carefully consider the attributes and goals of their study when choosing between these two designs, ensuring that the selected approach aligns with their research questions and objectives.

Comparisons may contain inaccurate information about people, places, or facts. Please report any issues.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Perspective
  • Published: 22 November 2022

Single case studies are a powerful tool for developing, testing and extending theories

  • Lyndsey Nickels   ORCID: orcid.org/0000-0002-0311-3524 1 , 2 ,
  • Simon Fischer-Baum   ORCID: orcid.org/0000-0002-6067-0538 3 &
  • Wendy Best   ORCID: orcid.org/0000-0001-8375-5916 4  

Nature Reviews Psychology volume  1 ,  pages 733–747 ( 2022 ) Cite this article

668 Accesses

5 Citations

26 Altmetric

Metrics details

  • Neurological disorders

Psychology embraces a diverse range of methodologies. However, most rely on averaging group data to draw conclusions. In this Perspective, we argue that single case methodology is a valuable tool for developing and extending psychological theories. We stress the importance of single case and case series research, drawing on classic and contemporary cases in which cognitive and perceptual deficits provide insights into typical cognitive processes in domains such as memory, delusions, reading and face perception. We unpack the key features of single case methodology, describe its strengths, its value in adjudicating between theories, and outline its benefits for a better understanding of deficits and hence more appropriate interventions. The unique insights that single case studies have provided illustrate the value of in-depth investigation within an individual. Single case methodology has an important place in the psychologist’s toolkit and it should be valued as a primary research tool.

This is a preview of subscription content, access via your institution

Access options

Subscribe to this journal

Receive 12 digital issues and online access to articles

55,14 € per year

only 4,60 € per issue

Buy this article

  • Purchase on Springer Link
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

unlike the case study the single participant experiment ____

Similar content being viewed by others

unlike the case study the single participant experiment ____

Comparing meta-analyses and preregistered multiple-laboratory replication projects

unlike the case study the single participant experiment ____

The fundamental importance of method to theory

unlike the case study the single participant experiment ____

A critical evaluation of the p-factor literature

Corkin, S. Permanent Present Tense: The Unforgettable Life Of The Amnesic Patient, H. M . Vol. XIX, 364 (Basic Books, 2013).

Lilienfeld, S. O. Psychology: From Inquiry To Understanding (Pearson, 2019).

Schacter, D. L., Gilbert, D. T., Nock, M. K. & Wegner, D. M. Psychology (Worth Publishers, 2019).

Eysenck, M. W. & Brysbaert, M. Fundamentals Of Cognition (Routledge, 2018).

Squire, L. R. Memory and brain systems: 1969–2009. J. Neurosci. 29 , 12711–12716 (2009).

Article   PubMed   PubMed Central   Google Scholar  

Corkin, S. What’s new with the amnesic patient H.M.? Nat. Rev. Neurosci. 3 , 153–160 (2002).

Article   PubMed   Google Scholar  

Schubert, T. M. et al. Lack of awareness despite complex visual processing: evidence from event-related potentials in a case of selective metamorphopsia. Proc. Natl Acad. Sci. USA 117 , 16055–16064 (2020).

Behrmann, M. & Plaut, D. C. Bilateral hemispheric processing of words and faces: evidence from word impairments in prosopagnosia and face impairments in pure alexia. Cereb. Cortex 24 , 1102–1118 (2014).

Plaut, D. C. & Behrmann, M. Complementary neural representations for faces and words: a computational exploration. Cogn. Neuropsychol. 28 , 251–275 (2011).

Haxby, J. V. et al. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293 , 2425–2430 (2001).

Hirshorn, E. A. et al. Decoding and disrupting left midfusiform gyrus activity during word reading. Proc. Natl Acad. Sci. USA 113 , 8162–8167 (2016).

Kosakowski, H. L. et al. Selective responses to faces, scenes, and bodies in the ventral visual pathway of infants. Curr. Biol. 32 , 265–274.e5 (2022).

Harlow, J. Passage of an iron rod through the head. Boston Med. Surgical J . https://doi.org/10.1176/jnp.11.2.281 (1848).

Broca, P. Remarks on the seat of the faculty of articulated language, following an observation of aphemia (loss of speech). Bull. Soc. Anat. 6 , 330–357 (1861).

Google Scholar  

Dejerine, J. Contribution A L’étude Anatomo-pathologique Et Clinique Des Différentes Variétés De Cécité Verbale: I. Cécité Verbale Avec Agraphie Ou Troubles Très Marqués De L’écriture; II. Cécité Verbale Pure Avec Intégrité De L’écriture Spontanée Et Sous Dictée (Société de Biologie, 1892).

Liepmann, H. Das Krankheitsbild der Apraxie (“motorischen Asymbolie”) auf Grund eines Falles von einseitiger Apraxie (Fortsetzung). Eur. Neurol. 8 , 102–116 (1900).

Article   Google Scholar  

Basso, A., Spinnler, H., Vallar, G. & Zanobio, M. E. Left hemisphere damage and selective impairment of auditory verbal short-term memory. A case study. Neuropsychologia 20 , 263–274 (1982).

Humphreys, G. W. & Riddoch, M. J. The fractionation of visual agnosia. In Visual Object Processing: A Cognitive Neuropsychological Approach 281–306 (Lawrence Erlbaum, 1987).

Whitworth, A., Webster, J. & Howard, D. A Cognitive Neuropsychological Approach To Assessment And Intervention In Aphasia (Psychology Press, 2014).

Caramazza, A. On drawing inferences about the structure of normal cognitive systems from the analysis of patterns of impaired performance: the case for single-patient studies. Brain Cogn. 5 , 41–66 (1986).

Caramazza, A. & McCloskey, M. The case for single-patient studies. Cogn. Neuropsychol. 5 , 517–527 (1988).

Shallice, T. Cognitive neuropsychology and its vicissitudes: the fate of Caramazza’s axioms. Cogn. Neuropsychol. 32 , 385–411 (2015).

Shallice, T. From Neuropsychology To Mental Structure (Cambridge Univ. Press, 1988).

Coltheart, M. Assumptions and methods in cognitive neuropscyhology. In The Handbook Of Cognitive Neuropsychology: What Deficits Reveal About The Human Mind (ed. Rapp, B.) 3–22 (Psychology Press, 2001).

McCloskey, M. & Chaisilprungraung, T. The value of cognitive neuropsychology: the case of vision research. Cogn. Neuropsychol. 34 , 412–419 (2017).

McCloskey, M. The future of cognitive neuropsychology. In The Handbook Of Cognitive Neuropsychology: What Deficits Reveal About The Human Mind (ed. Rapp, B.) 593–610 (Psychology Press, 2001).

Lashley, K. S. In search of the engram. In Physiological Mechanisms in Animal Behavior 454–482 (Academic Press, 1950).

Squire, L. R. & Wixted, J. T. The cognitive neuroscience of human memory since H.M. Annu. Rev. Neurosci. 34 , 259–288 (2011).

Stone, G. O., Vanhoy, M. & Orden, G. C. V. Perception is a two-way street: feedforward and feedback phonology in visual word recognition. J. Mem. Lang. 36 , 337–359 (1997).

Perfetti, C. A. The psycholinguistics of spelling and reading. In Learning To Spell: Research, Theory, And Practice Across Languages 21–38 (Lawrence Erlbaum, 1997).

Nickels, L. The autocue? self-generated phonemic cues in the treatment of a disorder of reading and naming. Cogn. Neuropsychol. 9 , 155–182 (1992).

Rapp, B., Benzing, L. & Caramazza, A. The autonomy of lexical orthography. Cogn. Neuropsychol. 14 , 71–104 (1997).

Bonin, P., Roux, S. & Barry, C. Translating nonverbal pictures into verbal word names. Understanding lexical access and retrieval. In Past, Present, And Future Contributions Of Cognitive Writing Research To Cognitive Psychology 315–522 (Psychology Press, 2011).

Bonin, P., Fayol, M. & Gombert, J.-E. Role of phonological and orthographic codes in picture naming and writing: an interference paradigm study. Cah. Psychol. Cogn./Current Psychol. Cogn. 16 , 299–324 (1997).

Bonin, P., Fayol, M. & Peereman, R. Masked form priming in writing words from pictures: evidence for direct retrieval of orthographic codes. Acta Psychol. 99 , 311–328 (1998).

Bentin, S., Allison, T., Puce, A., Perez, E. & McCarthy, G. Electrophysiological studies of face perception in humans. J. Cogn. Neurosci. 8 , 551–565 (1996).

Jeffreys, D. A. Evoked potential studies of face and object processing. Vis. Cogn. 3 , 1–38 (1996).

Laganaro, M., Morand, S., Michel, C. M., Spinelli, L. & Schnider, A. ERP correlates of word production before and after stroke in an aphasic patient. J. Cogn. Neurosci. 23 , 374–381 (2011).

Indefrey, P. & Levelt, W. J. M. The spatial and temporal signatures of word production components. Cognition 92 , 101–144 (2004).

Valente, A., Burki, A. & Laganaro, M. ERP correlates of word production predictors in picture naming: a trial by trial multiple regression analysis from stimulus onset to response. Front. Neurosci. 8 , 390 (2014).

Kittredge, A. K., Dell, G. S., Verkuilen, J. & Schwartz, M. F. Where is the effect of frequency in word production? Insights from aphasic picture-naming errors. Cogn. Neuropsychol. 25 , 463–492 (2008).

Domdei, N. et al. Ultra-high contrast retinal display system for single photoreceptor psychophysics. Biomed. Opt. Express 9 , 157 (2018).

Poldrack, R. A. et al. Long-term neural and physiological phenotyping of a single human. Nat. Commun. 6 , 8885 (2015).

Coltheart, M. The assumptions of cognitive neuropsychology: reflections on Caramazza (1984, 1986). Cogn. Neuropsychol. 34 , 397–402 (2017).

Badecker, W. & Caramazza, A. A final brief in the case against agrammatism: the role of theory in the selection of data. Cognition 24 , 277–282 (1986).

Fischer-Baum, S. Making sense of deviance: Identifying dissociating cases within the case series approach. Cogn. Neuropsychol. 30 , 597–617 (2013).

Nickels, L., Howard, D. & Best, W. On the use of different methodologies in cognitive neuropsychology: drink deep and from several sources. Cogn. Neuropsychol. 28 , 475–485 (2011).

Dell, G. S. & Schwartz, M. F. Who’s in and who’s out? Inclusion criteria, model evaluation, and the treatment of exceptions in case series. Cogn. Neuropsychol. 28 , 515–520 (2011).

Schwartz, M. F. & Dell, G. S. Case series investigations in cognitive neuropsychology. Cogn. Neuropsychol. 27 , 477–494 (2010).

Cohen, J. A power primer. Psychol. Bull. 112 , 155–159 (1992).

Martin, R. C. & Allen, C. Case studies in neuropsychology. In APA Handbook Of Research Methods In Psychology Vol. 2 Research Designs: Quantitative, Qualitative, Neuropsychological, And Biological (eds Cooper, H. et al.) 633–646 (American Psychological Association, 2012).

Leivada, E., Westergaard, M., Duñabeitia, J. A. & Rothman, J. On the phantom-like appearance of bilingualism effects on neurocognition: (how) should we proceed? Bilingualism 24 , 197–210 (2021).

Arnett, J. J. The neglected 95%: why American psychology needs to become less American. Am. Psychol. 63 , 602–614 (2008).

Stolz, J. A., Besner, D. & Carr, T. H. Implications of measures of reliability for theories of priming: activity in semantic memory is inherently noisy and uncoordinated. Vis. Cogn. 12 , 284–336 (2005).

Cipora, K. et al. A minority pulls the sample mean: on the individual prevalence of robust group-level cognitive phenomena — the instance of the SNARC effect. Preprint at psyArXiv https://doi.org/10.31234/osf.io/bwyr3 (2019).

Andrews, S., Lo, S. & Xia, V. Individual differences in automatic semantic priming. J. Exp. Psychol. Hum. Percept. Perform. 43 , 1025–1039 (2017).

Tan, L. C. & Yap, M. J. Are individual differences in masked repetition and semantic priming reliable? Vis. Cogn. 24 , 182–200 (2016).

Olsson-Collentine, A., Wicherts, J. M. & van Assen, M. A. L. M. Heterogeneity in direct replications in psychology and its association with effect size. Psychol. Bull. 146 , 922–940 (2020).

Gratton, C. & Braga, R. M. Editorial overview: deep imaging of the individual brain: past, practice, and promise. Curr. Opin. Behav. Sci. 40 , iii–vi (2021).

Fedorenko, E. The early origins and the growing popularity of the individual-subject analytic approach in human neuroscience. Curr. Opin. Behav. Sci. 40 , 105–112 (2021).

Xue, A. et al. The detailed organization of the human cerebellum estimated by intrinsic functional connectivity within the individual. J. Neurophysiol. 125 , 358–384 (2021).

Petit, S. et al. Toward an individualized neural assessment of receptive language in children. J. Speech Lang. Hear. Res. 63 , 2361–2385 (2020).

Jung, K.-H. et al. Heterogeneity of cerebral white matter lesions and clinical correlates in older adults. Stroke 52 , 620–630 (2021).

Falcon, M. I., Jirsa, V. & Solodkin, A. A new neuroinformatics approach to personalized medicine in neurology: the virtual brain. Curr. Opin. Neurol. 29 , 429–436 (2016).

Duncan, G. J., Engel, M., Claessens, A. & Dowsett, C. J. Replication and robustness in developmental research. Dev. Psychol. 50 , 2417–2425 (2014).

Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349 , aac4716 (2015).

Tackett, J. L., Brandes, C. M., King, K. M. & Markon, K. E. Psychology’s replication crisis and clinical psychological science. Annu. Rev. Clin. Psychol. 15 , 579–604 (2019).

Munafò, M. R. et al. A manifesto for reproducible science. Nat. Hum. Behav. 1 , 0021 (2017).

Oldfield, R. C. & Wingfield, A. The time it takes to name an object. Nature 202 , 1031–1032 (1964).

Oldfield, R. C. & Wingfield, A. Response latencies in naming objects. Q. J. Exp. Psychol. 17 , 273–281 (1965).

Brysbaert, M. How many participants do we have to include in properly powered experiments? A tutorial of power analysis with reference tables. J. Cogn. 2 , 16 (2019).

Brysbaert, M. Power considerations in bilingualism research: time to step up our game. Bilingualism https://doi.org/10.1017/S1366728920000437 (2020).

Machery, E. What is a replication? Phil. Sci. 87 , 545–567 (2020).

Nosek, B. A. & Errington, T. M. What is replication? PLoS Biol. 18 , e3000691 (2020).

Li, X., Huang, L., Yao, P. & Hyönä, J. Universal and specific reading mechanisms across different writing systems. Nat. Rev. Psychol. 1 , 133–144 (2022).

Rapp, B. (Ed.) The Handbook Of Cognitive Neuropsychology: What Deficits Reveal About The Human Mind (Psychology Press, 2001).

Code, C. et al. Classic Cases In Neuropsychology (Psychology Press, 1996).

Patterson, K., Marshall, J. C. & Coltheart, M. Surface Dyslexia: Neuropsychological And Cognitive Studies Of Phonological Reading (Routledge, 2017).

Marshall, J. C. & Newcombe, F. Patterns of paralexia: a psycholinguistic approach. J. Psycholinguist. Res. 2 , 175–199 (1973).

Castles, A. & Coltheart, M. Varieties of developmental dyslexia. Cognition 47 , 149–180 (1993).

Khentov-Kraus, L. & Friedmann, N. Vowel letter dyslexia. Cogn. Neuropsychol. 35 , 223–270 (2018).

Winskel, H. Orthographic and phonological parafoveal processing of consonants, vowels, and tones when reading Thai. Appl. Psycholinguist. 32 , 739–759 (2011).

Hepner, C., McCloskey, M. & Rapp, B. Do reading and spelling share orthographic representations? Evidence from developmental dysgraphia. Cogn. Neuropsychol. 34 , 119–143 (2017).

Hanley, J. R. & Sotiropoulos, A. Developmental surface dysgraphia without surface dyslexia. Cogn. Neuropsychol. 35 , 333–341 (2018).

Zihl, J. & Heywood, C. A. The contribution of single case studies to the neuroscience of vision: single case studies in vision neuroscience. Psych. J. 5 , 5–17 (2016).

Bouvier, S. E. & Engel, S. A. Behavioral deficits and cortical damage loci in cerebral achromatopsia. Cereb. Cortex 16 , 183–191 (2006).

Zihl, J. & Heywood, C. A. The contribution of LM to the neuroscience of movement vision. Front. Integr. Neurosci. 9 , 6 (2015).

Dotan, D. & Friedmann, N. Separate mechanisms for number reading and word reading: evidence from selective impairments. Cortex 114 , 176–192 (2019).

McCloskey, M. & Schubert, T. Shared versus separate processes for letter and digit identification. Cogn. Neuropsychol. 31 , 437–460 (2014).

Fayol, M. & Seron, X. On numerical representations. Insights from experimental, neuropsychological, and developmental research. In Handbook of Mathematical Cognition (ed. Campbell, J.) 3–23 (Psychological Press, 2005).

Bornstein, B. & Kidron, D. P. Prosopagnosia. J. Neurol. Neurosurg. Psychiat. 22 , 124–131 (1959).

Kühn, C. D., Gerlach, C., Andersen, K. B., Poulsen, M. & Starrfelt, R. Face recognition in developmental dyslexia: evidence for dissociation between faces and words. Cogn. Neuropsychol. 38 , 107–115 (2021).

Barton, J. J. S., Albonico, A., Susilo, T., Duchaine, B. & Corrow, S. L. Object recognition in acquired and developmental prosopagnosia. Cogn. Neuropsychol. 36 , 54–84 (2019).

Renault, B., Signoret, J.-L., Debruille, B., Breton, F. & Bolgert, F. Brain potentials reveal covert facial recognition in prosopagnosia. Neuropsychologia 27 , 905–912 (1989).

Bauer, R. M. Autonomic recognition of names and faces in prosopagnosia: a neuropsychological application of the guilty knowledge test. Neuropsychologia 22 , 457–469 (1984).

Haan, E. H. F., de, Young, A. & Newcombe, F. Face recognition without awareness. Cogn. Neuropsychol. 4 , 385–415 (1987).

Ellis, H. D. & Lewis, M. B. Capgras delusion: a window on face recognition. Trends Cogn. Sci. 5 , 149–156 (2001).

Ellis, H. D., Young, A. W., Quayle, A. H. & De Pauw, K. W. Reduced autonomic responses to faces in Capgras delusion. Proc. R. Soc. Lond. B 264 , 1085–1092 (1997).

Collins, M. N., Hawthorne, M. E., Gribbin, N. & Jacobson, R. Capgras’ syndrome with organic disorders. Postgrad. Med. J. 66 , 1064–1067 (1990).

Enoch, D., Puri, B. K. & Ball, H. Uncommon Psychiatric Syndromes 5th edn (Routledge, 2020).

Tranel, D., Damasio, H. & Damasio, A. R. Double dissociation between overt and covert face recognition. J. Cogn. Neurosci. 7 , 425–432 (1995).

Brighetti, G., Bonifacci, P., Borlimi, R. & Ottaviani, C. “Far from the heart far from the eye”: evidence from the Capgras delusion. Cogn. Neuropsychiat. 12 , 189–197 (2007).

Coltheart, M., Langdon, R. & McKay, R. Delusional belief. Annu. Rev. Psychol. 62 , 271–298 (2011).

Coltheart, M. Cognitive neuropsychiatry and delusional belief. Q. J. Exp. Psychol. 60 , 1041–1062 (2007).

Coltheart, M. & Davies, M. How unexpected observations lead to new beliefs: a Peircean pathway. Conscious. Cogn. 87 , 103037 (2021).

Coltheart, M. & Davies, M. Failure of hypothesis evaluation as a factor in delusional belief. Cogn. Neuropsychiat. 26 , 213–230 (2021).

McCloskey, M. et al. A developmental deficit in localizing objects from vision. Psychol. Sci. 6 , 112–117 (1995).

McCloskey, M., Valtonen, J. & Cohen Sherman, J. Representing orientation: a coordinate-system hypothesis and evidence from developmental deficits. Cogn. Neuropsychol. 23 , 680–713 (2006).

McCloskey, M. Spatial representations and multiple-visual-systems hypotheses: evidence from a developmental deficit in visual location and orientation processing. Cortex 40 , 677–694 (2004).

Gregory, E. & McCloskey, M. Mirror-image confusions: implications for representation and processing of object orientation. Cognition 116 , 110–129 (2010).

Gregory, E., Landau, B. & McCloskey, M. Representation of object orientation in children: evidence from mirror-image confusions. Vis. Cogn. 19 , 1035–1062 (2011).

Laine, M. & Martin, N. Cognitive neuropsychology has been, is, and will be significant to aphasiology. Aphasiology 26 , 1362–1376 (2012).

Howard, D. & Patterson, K. The Pyramids And Palm Trees Test: A Test Of Semantic Access From Words And Pictures (Thames Valley Test Co., 1992).

Kay, J., Lesser, R. & Coltheart, M. PALPA: Psycholinguistic Assessments Of Language Processing In Aphasia. 2: Picture & Word Semantics, Sentence Comprehension (Erlbaum, 2001).

Franklin, S. Dissociations in auditory word comprehension; evidence from nine fluent aphasic patients. Aphasiology 3 , 189–207 (1989).

Howard, D., Swinburn, K. & Porter, G. Putting the CAT out: what the comprehensive aphasia test has to offer. Aphasiology 24 , 56–74 (2010).

Conti-Ramsden, G., Crutchley, A. & Botting, N. The extent to which psychometric tests differentiate subgroups of children with SLI. J. Speech Lang. Hear. Res. 40 , 765–777 (1997).

Bishop, D. V. M. & McArthur, G. M. Individual differences in auditory processing in specific language impairment: a follow-up study using event-related potentials and behavioural thresholds. Cortex 41 , 327–341 (2005).

Bishop, D. V. M., Snowling, M. J., Thompson, P. A. & Greenhalgh, T., and the CATALISE-2 consortium. Phase 2 of CATALISE: a multinational and multidisciplinary Delphi consensus study of problems with language development: terminology. J. Child. Psychol. Psychiat. 58 , 1068–1080 (2017).

Wilson, A. J. et al. Principles underlying the design of ‘the number race’, an adaptive computer game for remediation of dyscalculia. Behav. Brain Funct. 2 , 19 (2006).

Basso, A. & Marangolo, P. Cognitive neuropsychological rehabilitation: the emperor’s new clothes? Neuropsychol. Rehabil. 10 , 219–229 (2000).

Murad, M. H., Asi, N., Alsawas, M. & Alahdab, F. New evidence pyramid. Evidence-based Med. 21 , 125–127 (2016).

Greenhalgh, T., Howick, J. & Maskrey, N., for the Evidence Based Medicine Renaissance Group. Evidence based medicine: a movement in crisis? Br. Med. J. 348 , g3725–g3725 (2014).

Best, W., Ping Sze, W., Edmundson, A. & Nickels, L. What counts as evidence? Swimming against the tide: valuing both clinically informed experimentally controlled case series and randomized controlled trials in intervention research. Evidence-based Commun. Assess. Interv. 13 , 107–135 (2019).

Best, W. et al. Understanding differing outcomes from semantic and phonological interventions with children with word-finding difficulties: a group and case series study. Cortex 134 , 145–161 (2021).

OCEBM Levels of Evidence Working Group. The Oxford Levels of Evidence 2. CEBM https://www.cebm.ox.ac.uk/resources/levels-of-evidence/ocebm-levels-of-evidence (2011).

Holler, D. E., Behrmann, M. & Snow, J. C. Real-world size coding of solid objects, but not 2-D or 3-D images, in visual agnosia patients with bilateral ventral lesions. Cortex 119 , 555–568 (2019).

Duchaine, B. C., Yovel, G., Butterworth, E. J. & Nakayama, K. Prosopagnosia as an impairment to face-specific mechanisms: elimination of the alternative hypotheses in a developmental case. Cogn. Neuropsychol. 23 , 714–747 (2006).

Hartley, T. et al. The hippocampus is required for short-term topographical memory in humans. Hippocampus 17 , 34–48 (2007).

Pishnamazi, M. et al. Attentional bias towards and away from fearful faces is modulated by developmental amygdala damage. Cortex 81 , 24–34 (2016).

Rapp, B., Fischer-Baum, S. & Miozzo, M. Modality and morphology: what we write may not be what we say. Psychol. Sci. 26 , 892–902 (2015).

Yong, K. X. X., Warren, J. D., Warrington, E. K. & Crutch, S. J. Intact reading in patients with profound early visual dysfunction. Cortex 49 , 2294–2306 (2013).

Rockland, K. S. & Van Hoesen, G. W. Direct temporal–occipital feedback connections to striate cortex (V1) in the macaque monkey. Cereb. Cortex 4 , 300–313 (1994).

Haynes, J.-D., Driver, J. & Rees, G. Visibility reflects dynamic changes of effective connectivity between V1 and fusiform cortex. Neuron 46 , 811–821 (2005).

Tanaka, K. Mechanisms of visual object recognition: monkey and human studies. Curr. Opin. Neurobiol. 7 , 523–529 (1997).

Fischer-Baum, S., McCloskey, M. & Rapp, B. Representation of letter position in spelling: evidence from acquired dysgraphia. Cognition 115 , 466–490 (2010).

Houghton, G. The problem of serial order: a neural network model of sequence learning and recall. In Current Research In Natural Language Generation (eds Dale, R., Mellish, C. & Zock, M.) 287–319 (Academic Press, 1990).

Fieder, N., Nickels, L., Biedermann, B. & Best, W. From “some butter” to “a butter”: an investigation of mass and count representation and processing. Cogn. Neuropsychol. 31 , 313–349 (2014).

Fieder, N., Nickels, L., Biedermann, B. & Best, W. How ‘some garlic’ becomes ‘a garlic’ or ‘some onion’: mass and count processing in aphasia. Neuropsychologia 75 , 626–645 (2015).

Schröder, A., Burchert, F. & Stadie, N. Training-induced improvement of noncanonical sentence production does not generalize to comprehension: evidence for modality-specific processes. Cogn. Neuropsychol. 32 , 195–220 (2015).

Stadie, N. et al. Unambiguous generalization effects after treatment of non-canonical sentence production in German agrammatism. Brain Lang. 104 , 211–229 (2008).

Schapiro, A. C., Gregory, E., Landau, B., McCloskey, M. & Turk-Browne, N. B. The necessity of the medial temporal lobe for statistical learning. J. Cogn. Neurosci. 26 , 1736–1747 (2014).

Schapiro, A. C., Kustner, L. V. & Turk-Browne, N. B. Shaping of object representations in the human medial temporal lobe based on temporal regularities. Curr. Biol. 22 , 1622–1627 (2012).

Baddeley, A., Vargha-Khadem, F. & Mishkin, M. Preserved recognition in a case of developmental amnesia: implications for the acaquisition of semantic memory? J. Cogn. Neurosci. 13 , 357–369 (2001).

Snyder, J. J. & Chatterjee, A. Spatial-temporal anisometries following right parietal damage. Neuropsychologia 42 , 1703–1708 (2004).

Ashkenazi, S., Henik, A., Ifergane, G. & Shelef, I. Basic numerical processing in left intraparietal sulcus (IPS) acalculia. Cortex 44 , 439–448 (2008).

Lebrun, M.-A., Moreau, P., McNally-Gagnon, A., Mignault Goulet, G. & Peretz, I. Congenital amusia in childhood: a case study. Cortex 48 , 683–688 (2012).

Vannuscorps, G., Andres, M. & Pillon, A. When does action comprehension need motor involvement? Evidence from upper limb aplasia. Cogn. Neuropsychol. 30 , 253–283 (2013).

Jeannerod, M. Neural simulation of action: a unifying mechanism for motor cognition. NeuroImage 14 , S103–S109 (2001).

Blakemore, S.-J. & Decety, J. From the perception of action to the understanding of intention. Nat. Rev. Neurosci. 2 , 561–567 (2001).

Rizzolatti, G. & Craighero, L. The mirror-neuron system. Annu. Rev. Neurosci. 27 , 169–192 (2004).

Forde, E. M. E., Humphreys, G. W. & Remoundou, M. Disordered knowledge of action order in action disorganisation syndrome. Neurocase 10 , 19–28 (2004).

Mazzi, C. & Savazzi, S. The glamor of old-style single-case studies in the neuroimaging era: insights from a patient with hemianopia. Front. Psychol. 10 , 965 (2019).

Coltheart, M. What has functional neuroimaging told us about the mind (so far)? (Position Paper Presented to the European Cognitive Neuropsychology Workshop, Bressanone, 2005). Cortex 42 , 323–331 (2006).

Page, M. P. A. What can’t functional neuroimaging tell the cognitive psychologist? Cortex 42 , 428–443 (2006).

Blank, I. A., Kiran, S. & Fedorenko, E. Can neuroimaging help aphasia researchers? Addressing generalizability, variability, and interpretability. Cogn. Neuropsychol. 34 , 377–393 (2017).

Niv, Y. The primacy of behavioral research for understanding the brain. Behav. Neurosci. 135 , 601–609 (2021).

Crawford, J. R. & Howell, D. C. Comparing an individual’s test score against norms derived from small samples. Clin. Neuropsychol. 12 , 482–486 (1998).

Crawford, J. R., Garthwaite, P. H. & Ryan, K. Comparing a single case to a control sample: testing for neuropsychological deficits and dissociations in the presence of covariates. Cortex 47 , 1166–1178 (2011).

McIntosh, R. D. & Rittmo, J. Ö. Power calculations in single-case neuropsychology: a practical primer. Cortex 135 , 146–158 (2021).

Patterson, K. & Plaut, D. C. “Shallow draughts intoxicate the brain”: lessons from cognitive science for cognitive neuropsychology. Top. Cogn. Sci. 1 , 39–58 (2009).

Lambon Ralph, M. A., Patterson, K. & Plaut, D. C. Finite case series or infinite single-case studies? Comments on “Case series investigations in cognitive neuropsychology” by Schwartz and Dell (2010). Cogn. Neuropsychol. 28 , 466–474 (2011).

Horien, C., Shen, X., Scheinost, D. & Constable, R. T. The individual functional connectome is unique and stable over months to years. NeuroImage 189 , 676–687 (2019).

Epelbaum, S. et al. Pure alexia as a disconnection syndrome: new diffusion imaging evidence for an old concept. Cortex 44 , 962–974 (2008).

Fischer-Baum, S. & Campana, G. Neuroplasticity and the logic of cognitive neuropsychology. Cogn. Neuropsychol. 34 , 403–411 (2017).

Paul, S., Baca, E. & Fischer-Baum, S. Cerebellar contributions to orthographic working memory: a single case cognitive neuropsychological investigation. Neuropsychologia 171 , 108242 (2022).

Feinstein, J. S., Adolphs, R., Damasio, A. & Tranel, D. The human amygdala and the induction and experience of fear. Curr. Biol. 21 , 34–38 (2011).

Crawford, J., Garthwaite, P. & Gray, C. Wanted: fully operational definitions of dissociations in single-case studies. Cortex 39 , 357–370 (2003).

McIntosh, R. D. Simple dissociations for a higher-powered neuropsychology. Cortex 103 , 256–265 (2018).

McIntosh, R. D. & Brooks, J. L. Current tests and trends in single-case neuropsychology. Cortex 47 , 1151–1159 (2011).

Best, W., Schröder, A. & Herbert, R. An investigation of a relative impairment in naming non-living items: theoretical and methodological implications. J. Neurolinguistics 19 , 96–123 (2006).

Franklin, S., Howard, D. & Patterson, K. Abstract word anomia. Cogn. Neuropsychol. 12 , 549–566 (1995).

Coltheart, M., Patterson, K. E. & Marshall, J. C. Deep Dyslexia (Routledge, 1980).

Nickels, L., Kohnen, S. & Biedermann, B. An untapped resource: treatment as a tool for revealing the nature of cognitive processes. Cogn. Neuropsychol. 27 , 539–562 (2010).

Download references

Acknowledgements

The authors thank all of those pioneers of and advocates for single case study research who have mentored, inspired and encouraged us over the years, and the many other colleagues with whom we have discussed these issues.

Author information

Authors and affiliations.

School of Psychological Sciences & Macquarie University Centre for Reading, Macquarie University, Sydney, New South Wales, Australia

Lyndsey Nickels

NHMRC Centre of Research Excellence in Aphasia Recovery and Rehabilitation, Australia

Psychological Sciences, Rice University, Houston, TX, USA

Simon Fischer-Baum

Psychology and Language Sciences, University College London, London, UK

You can also search for this author in PubMed   Google Scholar

Contributions

L.N. led and was primarily responsible for the structuring and writing of the manuscript. All authors contributed to all aspects of the article.

Corresponding author

Correspondence to Lyndsey Nickels .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Reviews Psychology thanks Yanchao Bi, Rob McIntosh, and the other, anonymous, reviewer for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article.

Nickels, L., Fischer-Baum, S. & Best, W. Single case studies are a powerful tool for developing, testing and extending theories. Nat Rev Psychol 1 , 733–747 (2022). https://doi.org/10.1038/s44159-022-00127-y

Download citation

Accepted : 13 October 2022

Published : 22 November 2022

Issue Date : December 2022

DOI : https://doi.org/10.1038/s44159-022-00127-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

unlike the case study the single participant experiment ____

Single-Case Designs

  • First Online: 30 April 2023

Cite this chapter

unlike the case study the single participant experiment ____

  • Lodi Lipien 3 ,
  • Megan Kirby 3 &
  • John M. Ferron 3  

Part of the book series: Autism and Child Psychopathology Series ((ACPS))

1595 Accesses

1 Citations

Single-case design (SCD), also known as single-case experimental design, single-subject design, or N-of-1 trials, refers to a research methodology that involves examining the effect of an intervention on a single individual over time by repeatedly measuring a target behavior across different intervention conditions. These designs may include replication across cases, but the focus is on individual effects. Differences in the target behaviors and individuals studied, as well as differences in the research questions posed, have spurred the development of a variety of single-case designs, each with distinct advantages in specific situations. These designs include reversal designs, multiple baseline designs (MBD), alternating treatments designs (ATD), and changing criterion designs (CCD). Our purpose is to describe these designs and their application in behavioral research. In doing so, we consider the questions they address and the conditions under which they are well suited to answer those questions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reversal designs, first described by Leitenberg ( 1973 ) and later reviewed by Wine et al. ( 2015 ), originally referred to a type of design in which the effects of one IV on two topographically distinct DVs (DV 1, DV 2) were repeatedly measured across time. The intervention, such as reinforcement, was presented in each phase but was in effect for either DV 1 or DV 2. The purpose of the use is to show changes in rates of responding when an IV is introduced to DV 1 and withdrawn from DV 2, as the rate of responding for each would change across phases when in the presence or absence of the IV. However, the reversal design as described is rarely used in contemporary behavior analytic literature and is often used interchangeably with withdrawal design .

Alberto, P. A., & Troutman, A. C. (2009). Applied behavior analysis for teachers (8th ed.). Pearson Education.

Google Scholar  

Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). Some current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis, 1 , 91–97.

Article   PubMed   PubMed Central   Google Scholar  

Barlow, D. H., & Hersen, M. (1984). Single case experimental designs: Strategies for studying behavior change . Pergamon.

Blair, B. J., Weiss, J. S., & Ahern, W. H. (2018). A comparison of task analysis training procedures. Education and Treatment of Children, 41 (3), 357–370.

Article   Google Scholar  

Bolanos, J. E., Reeve, K. F., Reeve, S. A., Sidener, T. M., Jennings, A. M., & Ostrosky, B. D. (2020). Using stimulus equivalence-based instruction to teach young children to sort recycling, trash, and compost items. Behavior and Social Issues . 29 , 78. https://doi-org.ezproxy.lib.usf.edu/10.1007/s42822-020-00028-w

Byiers, B., Reichle, J., & Symons, F. J. (2012). Single-subject experimental design for evidence-based practice. American Journal of Speech-Language Pathology, 21 (4), 397–414.

Article   PubMed   Google Scholar  

Craig, A. R., & Fisher, W. W. (2019). Randomization tests as alternative analysis methods for behavior-analytic data. Journal of the Experimental Analysis of Behavior, 11 (2), 309–328. https://doi.org/10.1002/jeab.500

Critchfield, T. S., & Shue, E. Z. H. (2018). The dead man test: A preliminary experimental analysis. Behavior Analysis in Practice, 11 , 381–384. https://doi.org/10.1007/s40617-018-0239-7

Engel, R. J., & Schutt, R. K. (2013). The practice of research in social work (3rd ed.). Sage Publications, Inc.

Ferron, J. M., & Jones, P. (2006). Tests for the visual analysis of response-guided multiple-baseline data. Journal of Experimental Education, 75 , 66–81.

Ferron, J. M., Rohrer, L. L., & Levin, J. R. (2019). Randomization procedures for changing criterion designs. Behavior Modification . Advance online publication. https://doi.org/10.1177/0145445519847627

Ferron, J., Goldstein, H., & Olszewski, & Rohrer, L. (2020). Indexing effects in single-case experimental designs by estimating the percent of goal obtained. Evidence-Based Communication Assessment and Intervention, 14 , 6–27. https://doi.org/10.1080/17489539.2020.1732024

Fontenot, B., Uwayo, M., Avendano, S. M., & Ross, D. (2019). A descriptive analysis of applied behavior analysis research with economically disadvantaged children. Behavior Analysis in Practice, 12 , 782–794.

Fuqua, R. W., & Schwade, J. (1986). Social validation of applied behavioral research. In A. Poling & R. W. Fuqua (Eds.), Research methods in applied behavior analysis . Springer. https://doi.org/10.1007/978-1-4684-8786-2_2

Chapter   Google Scholar  

Gast, D. L., & Ledford, J. R. (2014). Single case research methodology: Applications in special education and behavioral sciences (2nd ed.). Routledge.

Book   Google Scholar  

Gosens, L. C. F., Otten, R., Didden, R., & Poelen, E. A. P. (2020). Evaluating a personalized treatment for substance use disorder in people with mild intellectual disability or borderline intellectual functioning: A study protocol of a multiple baseline across individuals design. Contemporary Clinical Trials Communications, 19 , 100616.

Hartmann, D. P., & Hall, R. V. (1976). The changing criterion design. Journal of Applied Behavior Analysis, 9 , 527–532. https://doi.org/10.1901/jaba.1976.9-527

Horner, R. H., Carr, E. G., Halle, J., McGee, G., Odom, S., & Wolery, M. (2005). The use of single-subject research to identify evidence-based practice in special education. Exceptional Children, 71 (2), 165–179.

Johnston, J. M., & Pennypacker, H. S., Jr. (1980). Strategies and tactics of behavioral research . L. Erlbaum Associates.

Kazdin, A. E. (1977). Assessing the clinical or applied importance of behavior change through social validation. Behavior Modification, 1 , 427–452.

Klein, L. A., Houlihan, D., Vincent, J. L., & Panahon, C. J. (2015). Best practices in utilizing the changing criterion design. Behavior Analysis in Practice, 10 (1), 52–61. https://doi.org/10.1007/s40617-014-0036-x

Koehler, M. J., & Levin, J. R. (1998). Regulated randomization: A potentially sharper analytical tool for the multiple baseline design. Psychological Methods, 3 , 206–217.

Kratochwill, T. R., & Levin, J. R. (2010). Enhancing the scientific credibility of single-case intervention research: Randomization to the rescue. Psychological Methods, 15 (2), 124–144. https://doi.org/10.1037/a0017736

Kratochwill, T. R., Hitchcock, J., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M, & Shadish, W. R. (2010). Single-case designs technical documentation . Retrieved from What Works Clearinghouse website: http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf

Ledford, J. R., Barton, E. E., Severini, K. E., & Zimmerman, K. N. (2019). A primer on single-case research designs: Contemporary use and analysis. American Journal on Intellectual and Developmental Disabilities, 124 (1), 35–56.

Leitenberg, H. (1973). The use of single-case methodology in psychotherapy research. Journal of Abnormal Psychology, 82 , 87–101.

Li, A., Wallace, L., Ehrhardt, K. E., & Poling, A. (2017). Reporting participant characteristics in intervention articles published in five behavior-analytic journals, 2013–2015. Behavior Analysis: Research and Practice, 17 (1), 84–91.

Lobo, M. A., Moeyaert, M., Baraldi Cunha, A., & Babik, I. (2017). Single-case design, analysis, and quality assessment for intervention research. Journal of Neurologic Physical Therapy, 41 (3), 187–197. https://doi.org/10.1097/NPT.0000000000000187

McDougall, D. (2005). The range-bound changing criterion design. Behavioral Interventions, 20 , 129–137.

McDougall, D., Hawkins, J., Brady, M., & Jenkins, A. (2006). Recent innovations in the changing criterion design: Implications for research and practice in special education. Journal of Special Education, 40 (1), 2–15.

Moeyaert, M., Ferron, J., Beretvas, S. N., & Van den Noortgate, W. (2014). From a single-level analysis to a multilevel analysis of single-case experimental designs. Journal of School Psychology, 52 , 191–211.

Morgan, D. L., & Morgan, R. K. (2009). Single-case research methods for the behavioral and health sciences . Sage Publications.

Odom, S. L., Brantlinger, E., Gersten, R., Horner, R. H., Thompson, B., & Harris, K. (2005). Research in special education: Scientific methods and evidence-based practices. Exceptional Children, 71 (2), 137–148.

Onghena, P. (1992). Randomization tests for extensions and variations of ABAB single-case experimental designs: A rejoinder. Behavioral Assessment, 14 , 153–171.

Onghena, P. (2005). Single-case designs. In B. S. Everitt & D. C. Howell (Eds.), Encyclopedia of statistics in behavioral science . Wiley. https://doi.org/10.1002/0470013192

Onghena, P., Tanious, R., De, T. K., & Michiels, B. (2019). Randomization tests for changing criterion designs. Behaviour Research and Therapy, 117 , 18. https://doi.org/10.1016/j.brat.2019.01.005

Parker, R. I., Vannest, K. J., & Davis, J. L. (2011). Effect size in single-case research: A review of nine nonoverlap techniques. Behavior Modification, 35 , 303–322.

Perone, M., & Hursh, D. E. (2013). Single-case experimental designs. In G. J. Madden (Ed.), APA handbook of behavior analysis: Vol. 1. Methods and principles . American Psychological Association.

Poling, A., & Grossett, D. (1986). Basic research designs in applied behavior analysis. In A. Poling & R. W. Fuqua (Eds.), Research methods in applied behavior analysis . Springer. https://doi.org/10.1007/978-1-4684-8786-2_2

Shadish, W. R., Hedges, L. V., & Pustejovsky, J. E. (2014). Analysis and meta-analysis of single-case designs with a standardized mean difference statistic: A primer and applications. Journal of School Psychology, 52 , 123–147.

Sidman, M. (1960). Tactics of scientific research: Evaluating experimental data in psychology . Authors Cooperative, Inc.

Skinner, B. F. (1938). The behavior of organisms: An experimental analysis . Appleton-Century.

Skinner, B. F. (1966). Operant behavior. In W. K. Honig (Ed.), Operant behavior: Areas of research and application . Cambridge University Press.

Spencer, E. J., Goldstein, H., Sherman, A., Noe, S., Tabbah, R., Ziolkowski, R., & Schneider, N. (2012). Effects of an automated vocabulary and comprehension intervention: An early efficacy study. Journal of Early Intervention, 34 (4), 195–221. https://doi.org/10.1177/1053815112471990

Wang, Y., Kang, S., Ramirez, J., & Tarbox, J. (2019). Multilingual diversity in the field of applied behavior analysis and autism: A brief review and discussion of future directions. Behavior Analysis in Practice, 12 , 795–804.

Weaver, E. S., & Lloyd, B. P. (2019). Randomization tests for single case designs with rapidly alternating conditions: An analysis of p-values from published experiments. Perspectives on Behavior Science, 42 (3), 617–645. https://doi.org/10.1007/s40614-018-0165-6

Wine, B., Freeman, T. R., & King, A. (2015). Withdrawal versus reversal: A necessary distinction? Behavioral Interventions, 30 , 87–93.

Wolf, M. M. (1978). Social validity: The case for subjective measurement or how applied behavior analysis is finding its heart. Journal of Applied Behavior Analysis, 11 , 203–214.

Download references

Author information

Authors and affiliations.

University of South Florida, Tampa, FL, USA

Lodi Lipien, Megan Kirby & John M. Ferron

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to John M. Ferron .

Editor information

Editors and affiliations.

Department of Psychology, Louisiana State University, Baton Rouge, LA, USA

Johnny L. Matson

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Lipien, L., Kirby, M., Ferron, J.M. (2023). Single-Case Designs. In: Matson, J.L. (eds) Handbook of Applied Behavior Analysis. Autism and Child Psychopathology Series. Springer, Cham. https://doi.org/10.1007/978-3-031-19964-6_20

Download citation

DOI : https://doi.org/10.1007/978-3-031-19964-6_20

Published : 30 April 2023

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-19963-9

Online ISBN : 978-3-031-19964-6

eBook Packages : Behavioral Science and Psychology Behavioral Science and Psychology (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

10.3 The Single-Subject Versus Group “Debate”

Learning objectives.

  • Explain some of the points of disagreement between advocates of single-subject research and advocates of group research.
  • Identify several situations in which single-subject research would be appropriate and several others in which group research would be appropriate.

Single-subject research is similar to group research—especially experimental group research—in many ways. They are both quantitative approaches that try to establish causal relationships by manipulating an independent variable, measuring a dependent variable, and controlling extraneous variables. But there are important differences between these approaches too, and these differences sometimes lead to disagreements. It is worth addressing the most common points of disagreement between single-subject researchers and group researchers and how these disagreements can be resolved. As we will see, single-subject research and group research are probably best conceptualized as complementary approaches.

Data Analysis

One set of disagreements revolves around the issue of data analysis. Some advocates of group research worry that visual inspection is inadequate for deciding whether and to what extent a treatment has affected a dependent variable. One specific concern is that visual inspection is not sensitive enough to detect weak effects. A second is that visual inspection can be unreliable, with different researchers reaching different conclusions about the same set of data (Danov & Symons, 2008) [1] . A third is that the results of visual inspection—an overall judgment of whether or not a treatment was effective—cannot be clearly and efficiently summarized or compared across studies (unlike the measures of relationship strength typically used in group research).

In general, single-subject researchers share these concerns. However, they also argue that their use of the steady state strategy, combined with their focus on strong and consistent effects, minimizes most of them. If the effect of a treatment is difficult to detect by visual inspection because the effect is weak or the data are noisy, then single-subject researchers look for ways to increase the strength of the effect or reduce the noise in the data by controlling extraneous variables (e.g., by administering the treatment more consistently). If the effect is still difficult to detect, then they are likely to consider it neither strong enough nor consistent enough to be of further interest. Many single-subject researchers also point out that statistical analysis is becoming increasingly common and that many of them are using this as a supplement to visual inspection—especially for the purpose of comparing results across studies (Scruggs & Mastropieri, 2001) [2] .

Turning the tables, some advocates of single-subject research worry about the way that group researchers analyze their data. Specifically, they point out that focusing on group means can be highly misleading. Again, imagine that a treatment has a strong positive effect on half the people exposed to it and an equally strong negative effect on the other half. In a traditional between-subjects experiment, the positive effect on half the participants in the treatment condition would be statistically cancelled out by the negative effect on the other half. The mean for the treatment group would then be the same as the mean for the control group, making it seem as though the treatment had no effect when in fact it had a strong effect on every single participant!

But again, group researchers share this concern. Although they do focus on group statistics, they also emphasize the importance of examining distributions of individual scores. For example, if some participants were positively affected by a treatment and others negatively affected by it, this would produce a bimodal distribution of scores and could be detected by looking at a histogram of the data. The use of within-subjects designs is another strategy that allows group researchers to observe effects at the individual level and even to specify what percentage of individuals exhibit strong, medium, weak, and even negative effects. Finally, factorial designs can be used to examine whether the effects of an independent variable on a dependent variable differ in different groups of participants (introverts vs. extraverts).

External Validity

The second issue about which single-subject and group researchers sometimes disagree has to do with external validity—the ability to generalize the results of a study beyond the people and specific situation actually studied. In particular, advocates of group research point out the difficulty in knowing whether results for just a few participants are likely to generalize to others in the population. Imagine, for example, that in a single-subject study, a treatment has been shown to reduce self-injury for each of two children with intellectual disabilities. Even if the effect is strong for these two children, how can one know whether this treatment is likely to work for other children with intellectual delays?

Again, single-subject researchers share this concern. In response, they note that the strong and consistent effects they are typically interested in—even when observed in small samples—are likely to generalize to others in the population. Single-subject researchers also note that they place a strong emphasis on replicating their research results. When they observe an effect with a small sample of participants, they typically try to replicate it with another small sample—perhaps with a slightly different type of participant or under slightly different conditions. Each time they observe similar results, they rightfully become more confident in the generality of those results. Single-subject researchers can also point to the fact that the principles of classical and operant conditioning—most of which were discovered using the single-subject approach—have been successfully generalized across an incredibly wide range of species and situations.

And, once again turning the tables, single-subject researchers have concerns of their own about the external validity of group research. One extremely important point they make is that studying large groups of participants does not entirely solve the problem of generalizing to other  individuals . Imagine, for example, a treatment that has been shown to have a small positive effect on average in a large group study. It is likely that although many participants exhibited a small positive effect, others exhibited a large positive effect, and still others exhibited a small negative effect. When it comes to applying this treatment to another large  group , we can be fairly sure that it will have a small effect on average. But when it comes to applying this treatment to another  individual , we cannot be sure whether it will have a small, a large, or even a negative effect. Another point that single-subject researchers make is that group researchers also face a similar problem when they study a single situation and then generalize their results to other situations. For example, researchers who conduct a study on the effect of cell phone use on drivers on a closed oval track probably want to apply their results to drivers in many other real-world driving situations. But notice that this requires generalizing from a single situation to a population of situations. Thus the ability to generalize is based on much more than just the sheer number of participants one has studied. It requires a careful consideration of the similarity of the participants  and  situations studied to the population of participants and situations to which one wants to generalize (Shadish, Cook, & Campbell, 2002) [3] .

Single-Subject and Group Research as Complementary Methods

As with quantitative and qualitative research, it is probably best to conceptualize single-subject research and group research as complementary methods that have different strengths and weaknesses and that are appropriate for answering different kinds of research questions (Kazdin, 1982) [4] . Single-subject research is particularly good for testing the effectiveness of treatments on individuals when the focus is on strong, consistent, and biologically or socially important effects. It is also especially useful when the behavior of particular individuals is of interest. Clinicians who work with only one individual at a time may find that it is their only option for doing systematic quantitative research.

Group research, on the other hand, is ideal for testing the effectiveness of treatments at the group level. Among the advantages of this approach is that it allows researchers to detect weak effects, which can be of interest for many reasons. For example, finding a weak treatment effect might lead to refinements of the treatment that eventually produce a larger and more meaningful effect. Group research is also good for studying interactions between treatments and participant characteristics. For example, if a treatment is effective for those who are high in motivation to change and ineffective for those who are low in motivation to change, then a group design can detect this much more efficiently than a single-subject design. Group research is also necessary to answer questions that cannot be addressed using the single-subject approach, including questions about independent variables that cannot be manipulated (e.g., number of siblings, extraversion, culture).

Finally, it is important to understand that the single-subject and group approaches represent different research traditions. This factor is probably the most important one affecting which approach a researcher uses. Researchers in the experimental analysis of behavior and applied behavior analysis learn to conceptualize their research questions in ways that are amenable to the single-subject approach. Researchers in most other areas of psychology learn to conceptualize their research questions in ways that are amenable to the group approach. At the same time, there are many topics in psychology in which research from the two traditions have informed each other and been successfully integrated. One example is research suggesting that both animals and humans have an innate “number sense”—an awareness of how many objects or events of a particular type have they have experienced without actually having to count them (Dehaene, 2011) [5] . Single-subject research with rats and birds and group research with human infants have shown strikingly similar abilities in those populations to discriminate small numbers of objects and events. This number sense—which probably evolved long before humans did—may even be the foundation of humans’ advanced mathematical abilities.

The Principle of Converging Evidence

Now that you have been introduced to many of the most commonly used research methods in psychology it should be readily apparent that no design is perfect. Every research design has strengths and weakness. True experiments typically have high internal validity but may have problems with external validity, while non-experimental research (e.g., correlational research) often has good external validity but poor internal validity. Each study brings us closer to the truth but no single study can ever be considered definitive. This is one reason why, in science, we say there is no such thing as scientific proof, there is only scientific evidence.

While the media will often try to reach strong conclusions on the basis of the findings of one study, scientists focus on evaluating a body of research. Scientists evaluate theories not by waiting for the perfect experiment but by looking at the overall trends in a number of partially flawed studies. The idea of converging evidence tells us to examine the pattern of flaws running through the research literature because the nature of this pattern can either support or undermine the conclusions we wish to draw. Suppose the findings from a number of different studies were largely consistent in supporting a particular conclusion. If all of the studies were flawed in a similar way, for example, if all of the studies were correlational and contained the third variable problem and the directionality problem, this would undermine confidence in the conclusions drawn because the consistency of the outcome may simply have resulted from a particular flaw that all of the studies shared. On the other hand, if all of the studies were flawed in different ways and the weakness of some of the studies were the strength of others (the low external validity of a true experiment was balanced by the high external validity of a correlational study), then we could be more confident in our conclusions.

While there are fundamental tradeoffs in different research methods, the diverse set of approaches used by psychologists have complementary strengths that allow us to search for converging evidence. We can reach meaningful conclusions and come closer to understanding truth by examining a large number of different studies each with different strengths and weakness. If the result of a large number of studies all conducted using different designs converge on the same conclusion then our confidence in that conclusion can be increased dramatically. In science, we strive for progress, not perfection.

Key Takeaways

  • Differences between single-subject research and group research sometimes lead to disagreements between single-subject and group researchers. These disagreements center on the issues of data analysis and external validity (especially generalization to other people).
  • Single-subject research and group research are probably best seen as complementary methods, with different strengths and weaknesses, that are appropriate for answering different kinds of research questions.
  • Discussion: Imagine you have conducted a single-subject study showing a positive effect of a treatment on the behavior of a man with social anxiety disorder. Your research has been criticized on the grounds that it cannot be generalized to others. How could you respond to this criticism?
  • Discussion: Imagine you have conducted a group study showing a positive effect of a treatment on the behavior of a group of people with social anxiety disorder, but your research has been criticized on the grounds that “average” effects cannot be generalized to individuals. How could you respond to this criticism?
  • Practice: Redesign as a group study the study by Hall and his colleagues described at the beginning of this chapter, and list the strengths and weaknesses of your new study compared with the original study.
  • Practice: The generation effect refers to the fact that people who generate information as they are learning it (e.g., by self-testing) recall it better later than do people who simply review information. Design a single-subject study on the generation effect applied to university students learning brain anatomy.
  • Danov, S. E., & Symons, F. E. (2008). A survey evaluation of the reliability of visual inspection and functional analysis graphs. Behavior Modification, 32 , 828–839. ↵
  • Scruggs, T. E., & Mastropieri, M. A. (2001). How to summarize single-participant research: Ideas and applications.  Exceptionality, 9 , 227–244. ↵
  • Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002).  Experimental and quasi-experimental designs for generalized causal inference . Boston, MA: Houghton Mifflin. ↵
  • Kazdin, A. E. (1982).  Single-case research designs: Methods for clinical and applied settings . New York, NY: Oxford University Press. ↵
  • Dehaene, S. (2011).  The number sense: How the mind creates mathematics  (2nd ed.). New York, NY: Oxford. ↵

Creative Commons License

Share This Book

  • Increase Font Size

Unlike the Case Study, the Single-Participant Experiment ____

Question 73

Unlike the case study, the single-participant experiment ____.

A) cannot determine cause-and-effect relationships B) is based on the nomothetic research orientation C) is better able to assess cause-and-effect relationships D) is a correlational design

Correct Answer:

Q68: "There are many assets to using a

Q69: A source of error that is of

Q70: A characteristic of all endophenotypes is that

Q71: Kira is involved in a research study

Q72: The case study and the single-participant experiment

Q74: Martin and Matthew are identical twins. Matthew

Q75: Dr. Quillen treats a chronic smoker by

Q76: Which type of study is especially valuable

Q77: A particular concern about the usefulness of

Q78: A researcher has identified seventy individuals with

10+ million students use Quizplus to study and prepare for their homework, quizzes and exams through 20m+ questions in 300k quizzes.

Explore our library and get Abnormal Psychology Homework Help with various study sets and a huge amount of quizzes and questions

Unlock this Answer For Free Now!

View this answer and more for free by performing one of the following actions

qr-code

Scan the QR code to install the App and get 2 free unlocks

upload documents

Unlock quizzes for free by uploading documents

Logo for Digital Editions

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

9 Chapter 9: Simple Experiments

Simple experiments.

What Is an Experiment?

As we saw earlier, an experiment is a type of study designed specifically to answer the question of whether there is a causal relationship between two variables. Do changes in an independent variable cause changes in a dependent variable? Experiments have two fundamental features. The first is that the researchers manipulate, or systematically vary, the level of the independent variable. The different levels of the independent variable are called conditions. For example, in Darley and Latané’s experiment, the independent variable was the number of witnesses that participants believed to be present. The researchers manipulated this independent variable by telling participants that there were either one, two, or five other students involved in the discussion, thereby creating three conditions. The second fundamental feature of an experiment is that the researcher controls, or minimizes the variability in, variables other than the independent and dependent variable. These other variables are called extraneous variables. Darley and Latané tested all their participants in the same room, exposed them to the same emergency situation, and so on. They also randomly assigned their participants to conditions so that the three groups would be similar to each other to begin with. Notice that although the words manipulation and control have similar meanings in everyday language, researchers make a clear distinction between them. They manipulate the independent variable by systematically changing its levels and control other variables by holding them constant.

9.1  Experiment Basics

Internal Validity

Recall that the fact that two variables are statistically related does not necessarily mean that one causes the other. “Correlation does not imply causation.” For example, if it were the case that people who exercise regularly are happier than people who do not exercise regularly, this would not necessarily mean that exercising increases people’s happiness. It could mean instead that greater happiness causes people to exercise (the directionality problem) or that something like better physical health causes people to exercise and be happier (the third-variable problem).

The purpose of an experiment, however, is to show that two variables are statistically related and to do so in a way that supports the conclusion that the independent variable caused any observed differences in the dependent variable. The basic logic is this: If the researcher creates two or more highly similar conditions and then manipulates the independent variable to produce just one difference between them, then any later difference between the conditions must have been caused by the independent variable. For example, because the only difference between Darley and Latané’s conditions was the number of students that participants believed to be involved in the discussion, this must have been responsible for differences in helping between the conditions.

An empirical study is said to be high in internal validity if the way it was conducted supports the conclusion that the independent variable caused any observed differences in the dependent variable. Thus experiments are high in internal validity because the way they are conducted—with the manipulation of the independent variable and the control of extraneous variables—provides strong support for causal conclusions.

External Validity

At the same time, the way that experiments are conducted sometimes leads to a different kind of criticism. Specifically, the need to manipulate the independent variable and control extraneous variables means that experiments are often conducted under conditions that seem artificial or unlike “real life” (Stanovich, 2010). In many psychology experiments, the participants are all college undergraduates and come to a classroom or laboratory to fill out a series of paper-and-pencil questionnaires or to perform a carefully designed computerized task. Consider, for example, an experiment in which researcher Barbara Fredrickson and her colleagues had college students come to a laboratory on campus and complete a math test while wearing a swimsuit (Fredrickson, Roberts, Noll, Quinn, & Twenge, 1998). At first, this might seem silly. When will college students ever have to complete math tests in their swimsuits outside of this experiment?

The issue we are confronting is that of external validity. An empirical study is high in external validity if the way it was conducted supports generalizing the results to people and situations beyond those actually studied. As a general rule, studies are higher in external validity when the participants and the situation studied are similar to those that the researchers want to generalize to. Imagine, for example, that a group of researchers is interested in how shoppers in large grocery stores are affected by whether breakfast cereal is packaged in yellow or purple boxes. Their study would be high in external validity if they studied the decisions of ordinary people doing their weekly shopping in a real grocery store. If the shoppers bought much more cereal in purple boxes, the researchers would be fairly confident that this would be true for other shoppers in other stores. Their study would be relatively low in external validity, however, if they studied a sample of college students in a laboratory at a selective college who merely judged the appeal of various colors presented on a computer screen. If the students judged purple to be more appealing than yellow, the researchers would not be very confident that this is relevant to grocery shoppers’ cereal-buying decisions.

We should be careful, however, not to draw the blanket conclusion that experiments are low in external validity. One reason is that experiments need not seem artificial. Consider that Darley and Latané’s experiment provided a reasonably good simulation of a real emergency situation. Or consider field experiments that are conducted entirely outside the laboratory. In one such experiment, Robert Cialdini and his colleagues studied whether hotel guests choose to reuse their towels for a second day as opposed to having them washed as a way of conserving water and energy (Cialdini, 2005). These researchers manipulated the message on a card left in a large sample of hotel rooms. One version of the message emphasized showing respect for the environment, another emphasized that the hotel would donate a portion of their savings to an environmental cause, and a third emphasized that most hotel guests choose to reuse their towels. The result was that guests who received the message that most hotel guests choose to reuse their towels reused their own towels substantially more often than guests receiving either of the other two messages. Given the way they conducted their study, it seems very likely that their result would hold true for other guests in other hotels.

A second reason not to draw the blanket conclusion that experiments are low in external validity is that they are often conducted to learn about psychological processes that are likely to operate in a variety of people and situations. Let us return to the experiment by Fredrickson and colleagues. They found that the women in their study, but not the men, performed worse on the math test when they were wearing swimsuits. They argued that this was due to women’s greater tendency to objectify themselves—to think about themselves from the perspective of an outside observer—which diverts their attention away from other tasks. They argued, furthermore, that this process of self-objectification and its effect on attention is likely to operate in a variety of women and situations—even if none of them ever finds herself taking a math test in her swimsuit.

Manipulation of the Independent Variable

Again, to manipulate an independent variable means to change its level systematically so that different groups of participants are exposed to different levels of that variable, or the same group of participants is exposed to different levels at different times. For example, to see whether expressive writing affects people’s health, a researcher might instruct some participants to write about traumatic experiences and others to write about neutral experiences. The different levels of the independent variable are referred to as conditions, and researchers often give the conditions short descriptive names to make it easy to talk and write about them. In this case, the conditions might be called the “traumatic condition” and the “neutral condition.”

Notice that the manipulation of an independent variable must involve the active intervention of the researcher. Comparing groups of people who differ on the independent variable before the study begins is not the same as manipulating that variable. For example, a researcher who compares the health of people who already keep a journal with the health of people who do not keep a journal has not manipulated this variable and therefore not conducted an experiment. This is important because groups that already differ in one way at the beginning of a study are likely to differ in other ways too. For example, people who choose to keep journals might also be more conscientious, more introverted, or less stressed than people who do not. Therefore, any observed difference between the two groups in terms of their health might have been caused by whether or not they keep a journal, or it might have been caused by any of the other differences between people who do and do not keep journals. Thus the active manipulation of the independent variable is crucial for eliminating the third-variable problem.

Of course, there are many situations in which the independent variable cannot be manipulated for practical or ethical reasons and therefore an experiment is not possible. For example, whether or not people have a significant early illness experience cannot be manipulated, making it impossible to do an experiment on the effect of early illness experiences on the development of hypochondriasis. This does not mean it is impossible to study the relationship between early illness experiences and hypochondriasis—only that it must be done using non-experimental approaches. We will discuss this in detail later in the book.

In many experiments, the independent variable is a construct that can only be manipulated indirectly. For example, a researcher might try to manipulate participants’ stress levels indirectly by telling some of them that they have five minutes to prepare a short speech that they will then have to give to an audience of other participants. In such situations, researchers often include a manipulation check in their procedure. A manipulation check is a separate measure of the construct the researcher is trying to manipulate. For example, researchers trying to manipulate participants’ stress levels might give them a paper-and-pencil stress questionnaire or take their blood pressure—perhaps right after the manipulation or at the end of the procedure—to verify that they successfully manipulated this variable.

Control of Extraneous Variables

An extraneous variable is anything that varies in the context of a study other than the independent and dependent variables. In an experiment on the effect of expressive writing on health, for example, extraneous variables would include participant variables (individual differences) such as their writing ability, their diet, and their shoe size. They would also include situation or task variables such as the time of day when participants write, whether they write by hand or on a computer, and the weather. Extraneous variables pose a problem because many of them are likely to have some effect on the dependent variable. For example, participants’ health will be affected by many things other than whether or not they engage in expressive writing. This can make it difficult to separate the effect of the independent variable from the effects of the extraneous variables, which is why it is important to control extraneous variables by holding them constant.

One way to control extraneous variables is to hold them constant. This can mean holding situation or task variables constant by testing all participants in the same location, giving them identical instructions, treating them in the same way, and so on. It can also mean holding participant variables constant. For example, many studies of language limit participants to right-handed people, who generally have their language areas isolated in their left cerebral hemispheres. Left-handed people are more likely to have their language areas isolated in their right cerebral hemispheres or distributed across both hemispheres, which can change the way they process language and thereby add noise to the data.

In principle, researchers can control extraneous variables by limiting participants to one very specific category of person, such as 20-year-old, straight, female, right-handed, sophomore psychology majors. The obvious downside to this approach is that it would lower the external validity of the study—in particular, the extent to which the results can be generalized beyond the people actually studied. For example, it might be unclear whether results obtained with a sample of younger straight women would apply to older gay men. In many situations, the advantages of a diverse sample outweigh the reduction in noise achieved by a homogeneous one.

Extraneous Variables as Confounding Variables

The second way that extraneous variables can make it difficult to detect the effect of the independent variable is by becoming confounding variables. A confounding variable is an extraneous variable that differs on average across levels of the independent variable. For example, in almost all experiments, participants’ intelligence quotients (IQs) will be an extraneous variable. But as long as there are participants with lower and higher IQs at each level of the independent variable so that the average IQ is roughly equal, then this variation is probably acceptable (and may even be desirable). What would be bad, however, would be for participants at one level of the independent variable to have substantially lower IQs on average and participants at another level to have substantially higher IQs on average. In this case, IQ would be a confounding variable.

To confound means to confuse, and this is exactly what confounding variables do. Because they differ across conditions—just like the independent variable—they provide an alternative explanation for any observed difference in the dependent variable. Consider the results of a hypothetical study in which participants in a positive mood condition scored higher on a memory task than participants in a negative mood condition. If IQ is a confounding variable—with participants in the positive mood condition having higher IQs on average than participants in the negative mood condition—then it is unclear whether it was the positive moods or the higher IQs that caused participants in the first condition to score higher. One way to avoid confounding variables is by holding extraneous variables constant. For example, one could prevent IQ from becoming a confounding variable by limiting participants only to those with IQs of exactly 100. But this approach is not always desirable for reasons we have already discussed. A second and much more general approach—random assignment to conditions—will be discussed in detail shortly.

Key Takeaways

·         An experiment is a type of empirical study that features the manipulation of an independent variable, the measurement of a dependent variable, and control of extraneous variables.

·         Studies are high in internal validity to the extent that the way they are conducted supports the conclusion that the independent variable caused any observed differences in the dependent variable. Experiments are generally high in internal validity because of the manipulation of the independent variable and control of extraneous variables.

·         Studies are high in external validity to the extent that the result can be generalized to people and situations beyond those actually studied. Although experiments can seem “artificial”—and low in external validity—it is important to consider whether the psychological processes under study are likely to operate in other people and situations.

9.2  Experimental Design

In this section, we look at some different ways to design an experiment. The primary distinction we will make is between approaches in which each participant experiences one level of the independent variable and approaches in which each participant experiences all levels of the independent variable. The former are called between-subjects experiments and the latter are called within-subjects experiments.

Between-Subjects Experiments

In a between-subjects experiment, each participant is tested in only one condition. For example, a researcher with a sample of 100 college students might assign half of them to write about a traumatic event and the other half write about a neutral event. Or a researcher with a sample of 60 people with severe agoraphobia (fear of open spaces) might assign 20 of them to receive each of three different treatments for that disorder. It is essential in a between-subjects experiment that the researcher assign participants to conditions so that the different groups are, on average, highly similar to each other. Those in a trauma condition and a neutral condition, for example, should include a similar proportion of men and women, and they should have similar average intelligence quotients (IQs), similar average levels of motivation, similar average numbers of health problems, and so on. This is a matter of controlling these extraneous participant variables across conditions so that they do not become confounding variables.

Random Assignment

The primary way that researchers accomplish this kind of control of extraneous variables across conditions is called random assignment, which means using a random process to decide which participants are tested in which conditions. Do not confuse random assignment with random sampling. Random sampling is a method for selecting a sample from a population, and it is rarely used in psychological research. Random assignment is a method for assigning participants in a sample to the different conditions, and it is an important element of all experimental research in psychology and other fields too.

In its strictest sense, random assignment should meet two criteria. One is that each participant has an equal chance of being assigned to each condition (e.g., a 50% chance of being assigned to each of two conditions). The second is that each participant is assigned to a condition independently of other participants. Thus one way to assign participants to two conditions would be to flip a coin for each one. If the coin lands heads, the participant is assigned to Condition A, and if it lands tails, the participant is assigned to Condition B. For three conditions, one could use a computer to generate a random integer from 1 to 3 for each participant. If the integer is 1, the participant is assigned to Condition A; if it is 2, the participant is assigned to Condition B; and if it is 3, the participant is assigned to Condition C. In practice, a full sequence of conditions—one for each participant expected to be in the experiment—is usually created ahead of time, and each new participant is assigned to the next condition in the sequence as he or she is tested. When the procedure is computerized, the computer program often handles the random assignment.

One problem with coin flipping and other strict procedures for random assignment is that they are likely to result in unequal sample sizes in the different conditions. Unequal sample sizes are generally not a serious problem, and you should never throw away data you have already collected to achieve equal sample sizes. However, for a fixed number of participants, it is statistically most efficient to divide them into equal-sized groups. It is standard practice, therefore, to use a kind of modified random assignment that keeps the number of participants in each group as similar as possible. One approach is block randomization. In block randomization, all the conditions occur once in the sequence before any of them is repeated. Then they all occur again before any of them is repeated again. Within each of these “blocks,” the conditions occur in a random order. Again, the sequence of conditions is usually generated before any participants are tested, and each new participant is assigned to the next condition in the sequence. Random assignment is not guaranteed to control all extraneous variables across conditions. It is always possible that just by chance, the participants in one condition might turn out to be substantially older, less tired, more motivated, or less depressed on average than the participants in another condition. However, there are some reasons that this is not a major concern. One is that random assignment works better than one might expect, especially for large samples. Another is that the inferential statistics that researchers use to decide whether a difference between groups reflects a difference in the population takes the “fallibility” of random assignment into account. Yet another reason is that even if random assignment does result in a confounding variable and therefore produces misleading results, this is likely to be detected when the experiment is replicated. The upshot is that random assignment to conditions—although not infallible in terms of controlling extraneous variables—is always considered a strength of a research design.

Treatment and Control Conditions

Between-subjects experiments are often used to determine whether a treatment works. In psychological research, a treatment is any intervention meant to change people’s behavior for the better. This includes psychotherapies and medical treatments for psychological disorders but also interventions designed to improve learning, promote conservation, reduce prejudice, and so on. To determine whether a treatment works, participants are randomly assigned to either a treatment condition, in which they receive the treatment, or a control condition, in which they do not receive the treatment. If participants in the treatment condition end up better off than participants in the control condition—for example, they are less depressed, learn faster, conserve more, express less prejudice—then the researcher can conclude that the treatment works. In research on the effectiveness of psychotherapies and medical treatments, this type of experiment is often called a randomized clinical trial.

There are different types of control conditions. In a no-treatment control condition, participants receive no treatment whatsoever. One problem with this approach, however, is the existence of placebo effects. A placebo is a simulated treatment that lacks any active ingredient or element that should make it effective, and a placebo effect is a positive effect of such a treatment. Many folk remedies that seem to work—such as eating chicken soup for a cold or placing soap under the bedsheets to stop nighttime leg cramps—are probably nothing more than placebos. Although placebo effects are not well understood, they are probably driven primarily by people’s expectations that they will improve. Having the expectation to improve can result in reduced stress, anxiety, and depression, which can alter perceptions and even improve immune system functioning (Price, Finniss, & Benedetti, 2008).

Placebo effects are interesting in their own right, but they also pose a serious problem for researchers who want to determine whether a treatment works. Fortunately, there are several solutions to this problem. One is to include a placebo control condition, in which participants receive a placebo that looks much like the treatment but lacks the active ingredient or element thought to be responsible for the treatment’s effectiveness. When participants in a treatment condition take a pill, for example, then those in a placebo control condition would take an identical-looking pill that lacks the active ingredient in the treatment (a “sugar pill”). In research on psychotherapy effectiveness, the placebo might involve going to a psychotherapist and talking in an unstructured way about one’s problems. The idea is that if participants in both the treatment and the placebo control groups expect to improve, then any improvement in the treatment group over and above that in the placebo control group must have been caused by the treatment and not by participants’ expectations.

Of course, the principle of informed consent requires that participants be told that they will be assigned to either a treatment or a placebo control condition—even though they cannot be told which until the experiment ends. In many cases the participants who had been in the control condition are then offered an opportunity to have the real treatment. An alternative approach is to use a waitlist control condition, in which participants are told that they will receive the treatment but must wait until the participants in the treatment condition have already received it. This allows researchers to compare participants who have received the treatment with participants who are not currently receiving it but who still expect to improve (eventually). A final solution to the problem of placebo effects is to leave out the control condition completely and compare any new treatment with the best available alternative treatment. For example, a new treatment for simple phobia could be compared with standard exposure therapy. Because participants in both conditions receive a treatment, their expectations about improvement should be similar. This approach also makes sense because once there is an effective treatment, the interesting question about a new treatment is not simply “Does it work?” but “Does it work better than what is already available?”

Within-Subjects Experiments

In a within-subjects experiment, each participant is tested under all conditions. Consider an experiment on the effect of a defendant’s physical attractiveness on judgments of his guilt. Again, in a between-subjects experiment, one group of participants would be shown an attractive defendant and asked to judge his guilt, and another group of participants would be shown an unattractive defendant and asked to judge his guilt. In a within-subjects experiment, however, the same group of participants would judge the guilt of both an attractive and an unattractive defendant.

The primary advantage of this approach is that it provides maximum control of extraneous participant variables. Participants in all conditions have the same mean IQ, same socioeconomic status, same number of siblings, and so on—because they are the very same people. Within-subjects experiments also make it possible to use statistical procedures that remove the effect of these extraneous participant variables on the dependent variable and therefore make the data less “noisy” and the effect of the independent variable easier to detect. We will look more closely at this idea later in the book.

Carryover Effects and Counterbalancing

The primary disadvantage of within-subjects designs is that they can result in carryover effects. A carryover effect is an effect of being tested in one condition on participants’ behavior in later conditions. One type of carryover effect is a practice effect, where participants perform a task better in later conditions because they have had a chance to practice it. Another type is a fatigue effect, where participants perform a task worse in later conditions because they become tired or bored. Being tested in one condition can also change how participants perceive stimuli or interpret their task in later conditions. This is called a context effect. For example, an average-looking defendant might be judged more harshly when participants have just judged an attractive defendant than when they have just judged an unattractive defendant. Within-subjects experiments also make it easier for participants to guess the hypothesis. For example, a participant who is asked to judge the guilt of an attractive defendant and then is asked to judge the guilt of an unattractive defendant is likely to guess that the hypothesis is that defendant attractiveness affects judgments of guilt. This could lead the participant to judge the unattractive defendant more harshly because he thinks this is what he is expected to do. Or it could make participants judge the two defendants similarly in an effort to be “fair.”

Carryover effects can be interesting in their own right. (Does the attractiveness of one person depend on the attractiveness of other people that we have seen recently?) But when they are not the focus of the research, carryover effects can be problematic. Imagine, for example, that participants judge the guilt of an attractive defendant and then judge the guilt of an unattractive defendant. If they judge the unattractive defendant more harshly, this might be because of his unattractiveness. But it could be instead that they judge him more harshly because they are becoming bored or tired. In other words, the order of the conditions is a confounding variable. The attractive condition is always the first condition and the unattractive condition the second. Thus any difference between the conditions in terms of the dependent variable could be caused by the order of the conditions and not the independent variable itself.

There is a solution to the problem of order effects, however, that can be used in many situations. It is counterbalancing, which means testing different participants in different orders. For example, some participants would be tested in the attractive defendant condition followed by the unattractive defendant condition, and others would be tested in the unattractive condition followed by the attractive condition. With three conditions, there would be six different orders (ABC, ACB, BAC, BCA, CAB, and CBA), so some participants would be tested in each of the six orders. With counterbalancing, participants are assigned to orders randomly, using the techniques we have already discussed. Thus random assignment plays an important role in within-subjects designs just as in between-subjects designs. Here, instead of randomly assigning to conditions, they are randomly assigned to different orders of conditions. In fact, it can safely be said that if a study does not involve random assignment in one form or another, it is not an experiment.

There are two ways to think about what counterbalancing accomplishes. One is that it controls the order of conditions so that it is no longer a confounding variable. Instead of the attractive condition always being first and the unattractive condition always being second, the attractive condition comes first for some participants and second for others. Likewise, the unattractive condition comes first for some participants and second for others. Thus any overall difference in the dependent variable between the two conditions cannot have been caused by the order of conditions. A second way to think about what counterbalancing accomplishes is that if there are carryover effects, it makes it possible to detect them. One can analyze the data separately for each order to see whether it had an effect.

Simultaneous Within-Subjects Designs

So far, we have discussed an approach to within-subjects designs in which participants are tested in one condition at a time. There is another approach, however, that is often used when participants make multiple responses in each condition. Imagine, for example, that participants judge the guilt of 10 attractive defendants and 10 unattractive defendants. Instead of having people make judgments about all 10 defendants of one type followed by all 10 defendants of the other type, the researcher could present all 20 defendants in a sequence that mixed the two types. The researcher could then compute each participant’s mean rating for each type of defendant. Or imagine an experiment designed to see whether people with social anxiety disorder remember negative adjectives (e.g., “stupid,” “incompetent”) better than positive ones (e.g., “happy,” “productive”). The researcher could have participants study a single list that includes both kinds of words and then have them try to recall as many words as possible. The researcher could then count the number of each type of word that was recalled. There are many ways to determine the order in which the stimuli are presented, but one common way is to generate a different random order for each participant.

Between-Subjects or Within-Subjects?

Almost every experiment can be conducted using either a between-subjects design or a within-subjects design. This means that researchers must choose between the two approaches based on their relative merits for the particular situation.

Between-subjects experiments have the advantage of being conceptually simpler and requiring less testing time per participant. They also avoid carryover effects without the need for counterbalancing. Within-subjects experiments have the advantage of controlling extraneous participant variables, which generally reduces noise in the data and makes it easier to detect a relationship between the independent and dependent variables.

A good rule of thumb, then, is that if it is possible to conduct a within-subjects experiment (with proper counterbalancing) in the time that is available per participant—and you have no serious concerns about carryover effects—this is probably the best option. If a within-subjects design would be difficult or impossible to carry out, then you should consider a between-subjects design instead. For example, if you were testing participants in a doctor’s waiting room or shoppers in line at a grocery store, you might not have enough time to test each participant in all conditions and therefore would opt for a between-subjects design. Or imagine you were trying to reduce people’s level of prejudice by having them interact with someone of another race. A within-subjects design with counterbalancing would require testing some participants in the treatment condition first and then in a control condition. But if the treatment works and reduces people’s level of prejudice, then they would no longer be suitable for testing in the control condition. This is true for many designs that involve a treatment meant to produce long-term change in participants’ behavior (e.g., studies testing the effectiveness of psychotherapy). Clearly, a between-subjects design would be necessary here.

Remember also that using one type of design does not preclude using the other type in a different study. There is no reason that a researcher could not use both a between-subjects design and a within-subjects design to answer the same research question. In fact, professional researchers often do exactly this.

·         Experiments can be conducted using either between-subjects or within-subjects designs. Deciding which to use in a particular situation requires careful consideration of the pros and cons of each approach.

·         Random assignment to conditions in between-subjects experiments or to orders of conditions in within-subjects experiments is a fundamental element of experimental research. Its purpose is to control extraneous variables so that they do not become confounding variables.

·         Experimental research on the effectiveness of a treatment requires both a treatment condition and a control condition, which can be a no-treatment control condition, a placebo control condition, or a waitlist control condition. Experimental treatments can also be compared with the best available alternative.

9.3  Conducting Experiments

The information presented so far in this chapter is enough to design a basic experiment. When it comes time to conduct that experiment, however, several additional practical issues arise. In this section, we consider some of these issues and how to deal with them. Much of this information applies to non-experimental studies as well as experimental ones.

Recruiting Participants

Of course, you should be thinking about how you will obtain your participants from the beginning of any research project. Unless you have access to people with schizophrenia or incarcerated juvenile offenders, for example, then there is no point designing a study that focuses on these populations. But even if you plan to use a convenience sample, you will have to recruit participants for your study.

There are several approaches to recruiting participants. One is to use participants from a formal subject pool—an established group of people who have agreed to be contacted about participating in research studies. For example, at many colleges and universities, there is a subject pool consisting of students enrolled in introductory psychology courses who must participate in a certain number of studies to meet a course requirement. Researchers post descriptions of their studies and students sign up to participate, usually via an online system. Participants who are not in subject pools can also be recruited by posting or publishing advertisements or making personal appeals to groups that represent the population of interest. For example, a researcher interested in studying older adults could arrange to speak at a meeting of the residents at a retirement community to explain the study and ask for volunteers.

The Volunteer Subject

Even if the participants in a study receive compensation in the form of course credit, a small amount of money, or a chance at being treated for a psychological problem, they are still essentially volunteers. This is worth considering because people who volunteer to participate in psychological research have been shown to differ in predictable ways from those who do not volunteer. Specifically, there is good evidence that on average, volunteers have the following characteristics compared with non-volunteers (Rosenthal Rosnow, 1976):

·         They are more interested in the topic of the research.

·         They are more educated.

·         They have a greater need for approval.

·         They have higher intelligence quotients (IQs).

·         They are more sociable.

·         They are higher in social class.

This can be an issue of external validity if there is reason to believe that participants with these characteristics are likely to behave differently than the general population. For example, in testing different methods of persuading people, a rational argument might work better on volunteers than it does on the general population because of their generally higher educational level and IQ.

In many field experiments, the task is not recruiting participants but selecting them. For example, researchers Nicolas Guéguen and Marie-Agnès de Gail conducted a field experiment on the effect of being smiled at on helping, in which the participants were shoppers at a supermarket. A confederate walking down a stairway gazed directly at a shopper walking up the stairway and either smiled or did not smile. Shortly afterward, the shopper encountered another confederate, who dropped some computer diskettes on the ground. The dependent variable was whether or not the shopper stopped to help pick up the diskettes (Guéguen & de Gail, 2003). Notice that these participants were not “recruited,” but the researchers still had to select them from among all the shoppers taking the stairs that day. It is extremely important that this kind of selection be done according to a well-defined set of rules that is established before the data collection begins and can be explained clearly afterward. In this case, with each trip down the stairs, the confederate was instructed to gaze at the first person he encountered who appeared to be between the ages of 20 and 50. Only if the person gazed back did he or she become a participant in the study. The point of having a well-defined selection rule is to avoid bias in the selection of participants. For example, if the confederate was free to choose which shoppers he would gaze at, he might choose friendly-looking shoppers when he was set to smile and unfriendly-looking ones when he was not set to smile. As we will see shortly, such biases can be entirely unintentional.

Standardizing the Procedure

It is surprisingly easy to introduce extraneous variables during the procedure. For example, the same experimenter might give clear instructions to one participant but vague instructions to another. Or one experimenter might greet participants warmly while another barely makes eye contact with them. To the extent that such variables affect participants’ behaviour, they add noise to the data and make the effect of the independent variable more difficult to detect. If they vary across conditions, they become confounding variables and provide alternative explanations for the results. For example, if participants in a treatment group are tested by a warm and friendly experimenter and participants in a control group are tested by a cold and unfriendly one, then what appears to be an effect of the treatment might actually be an effect of experimenter demeanour.

Experimenter Expectancy Effects

It is well known that whether research participants are male or female can affect the results of a study. But what about whether the experimenter is male or female? There is plenty of evidence that this matters too. Male and female experimenters have slightly different ways of interacting with their participants, and of course participants also respond differently to male and female experimenters (Rosenthal, 1976). For example, in a recent study on pain perception, participants immersed their hands in icy water for as long as they could (Ibolya, Brake, & Voss, 2004). Male participants tolerated the pain longer when the experimenter was a woman, and female participants tolerated it longer when the experimenter was a man.

Researcher Robert Rosenthal has spent much of his career showing that this kind of unintended variation in the procedure does, in fact, affect participants’ behaviour. Furthermore, one important source of such variation is the experimenter’s expectations about how participants “should” behave in the experiment. This is referred to as an experimenter expectancy effect (Rosenthal, 1976). For example, if an experimenter expects participants in a treatment group to perform better on a task than participants in a control group, then he or she might unintentionally give the treatment group participants clearer instructions or more encouragement or allow them more time to complete the task. In a striking example, Rosenthal and Kermit Fode had several students in a laboratory course in psychology train rats to run through a maze. Although the rats were genetically similar, some of the students were told that they were working with “maze-bright” rats that had been bred to be good learners, and other students were told that they were working with “maze-dull” rats that had been bred to be poor learners. Sure enough, over five days of training, the “maze-bright” rats made more correct responses, made the correct response more quickly, and improved more steadily than the “maze-dull” rats (Rosenthal & Fode, 1963). Clearly it had to have been the students’ expectations about how the rats would perform that made the difference. But how? Some clues come from data gathered at the end of the study, which showed that students who expected their rats to learn quickly felt more positively about their animals and reported behaving toward them in a more friendly manner (e.g., handling them more).

The way to minimize unintended variation in the procedure is to standardize it as much as possible so that it is carried out in the same way for all participants regardless of the condition they are in. Here are several ways to do this:

·         Create a written protocol that specifies everything that the experimenters are to do and say from the time they greet participants to the time they dismiss them.

·         Create standard instructions that participants read themselves or that are read to them word for word by the experimenter.

·         Automate the rest of the procedure as much as possible by using software packages for this purpose or even simple computer slide shows.

·         Anticipate participants’ questions and either raise and answer them in the instructions or develop standard answers for them.

·         Train multiple experimenters on the protocol together and have them practice on each other.

·         Be sure that each experimenter tests participants in all conditions.

Another good practice is to arrange for the experimenters to be “blind” to the research question or to the condition that each participant is tested in. The idea is to minimize experimenter expectancy effects by minimizing the experimenters’ expectations. For example, in a drug study in which each participant receives the drug or a placebo, it is often the case that neither the participants nor the experimenter who interacts with the participants know which condition he or she has been assigned to. Because both the participants and the experimenters are blind to the condition, this is referred to as a double-blind study. (A single-blind study is one in which the participant, but not the experimenter, is blind to the condition.) Of course, there are many times this is not possible. For example, if you are both the investigator and the only experimenter, it is not possible for you to remain blind to the research question. Also, in many studies the experimenter must know the condition because he or she must carry out the procedure in a different way in the different conditions.

Record Keeping

It is essential to keep good records when you conduct an experiment. As discussed earlier, it is typical for experimenters to generate a written sequence of conditions before the study begins and then to test each new participant in the next condition in the sequence. As you test them, it is a good idea to add to this list basic demographic information; the date, time, and place of testing; and the name of the experimenter who did the testing. It is also a good idea to have a place for the experimenter to write down comments about unusual occurrences (e.g., a confused or uncooperative participant) or questions that come up. This kind of information can be useful later if you decide to analyze sex differences or effects of different experimenters, or if a question arises about a particular participant or testing session.

It can also be useful to assign an identification number to each participant as you test them. Simply numbering them consecutively beginning with 1 is usually sufficient. This number can then also be written on any response sheets or questionnaires that participants generate, making it easier to keep them together.

Pilot Testing

It is always a good idea to conduct a pilot test of your experiment. A pilot test is a small-scale study conducted to make sure that a new procedure works as planned. In a pilot test, you can recruit participants formally (e.g., from an established participant pool) or you can recruit them informally from among family, friends, classmates, and so on. The number of participants can be small, but it should be enough to give you confidence that your procedure works as planned. There are several important questions that you can answer by conducting a pilot test:

·         Do participants understand the instructions?

·         What kind of misunderstandings do participants have, what kind of mistakes do they make, and what kind of questions do they ask?

·         Do participants become bored or frustrated?

·         Is an indirect manipulation effective? (You will need to include a manipulation check.)

·         Can participants guess the research question or hypothesis?

·         How long does the procedure take?

·         Are computer programs or other automated procedures working properly?

·         Are data being recorded correctly?

Of course, to answer some of these questions you will need to observe participants carefully during the procedure and talk with them about it afterward. Participants are often hesitant to criticize a study in front of the researcher, so be sure they understand that this is a pilot test and you are genuinely interested in feedback that will help you improve the procedure. If the procedure works as planned, then you can proceed with the actual study. If there are problems to be solved, you can solve them, pilot test the new procedure, and continue with this process until you are ready to proceed.

·         There are several effective methods you can use to recruit research participants for your experiment, including through formal subject pools, advertisements, and personal appeals. Field experiments require well-defined participant selection procedures.

·         It is important to standardize experimental procedures to minimize extraneous variables, including experimenter expectancy effects.

·         It is important to conduct one or more small-scale pilot tests of an experiment to be sure that the procedure works as planned.

References from Chapter 9

Birnbaum, M. H. (1999). How to show that 9 221: Collect judgments in a between-subjects design. Psychological Methods, 4, 243–249.

Cialdini, R. (2005, April). Don’t throw in the towel: Use social influence research. APS Observer. Retrieved from  http://www.psychologicalscience.org/observer/getArticle.cfm?id=1762 .

Fredrickson, B. L., Roberts, T.-A., Noll, S. M., Quinn, D. M., & Twenge, J. M. (1998). The swimsuit becomes you: Sex differences in self-objectification, restrained eating, and math performance. Journal of Personality and Social Psychology, 75, 269–284.

Guéguen, N., & de Gail, Marie-Agnès. (2003). The effect of smiling on helping behavior: Smiling and good Samaritan behavior. Communication Reports, 16, 133–140.

Ibolya, K., Brake, A., & Voss, U. (2004). The effect of experimenter characteristics on pain reports in women and men. Pain, 112, 142–147.

Moseley, J. B., O’Malley, K., Petersen, N. J., Menke, T. J., Brody, B. A., Kuykendall, D. H., … & Wray, N. P. (2002). A controlled trial of arthroscopic surgery for osteoarthritis of the knee. The New England Journal of Medicine, 347, 81–88.

Price, D. D., Finniss, D. G., & Benedetti, F. (2008). A comprehensive review of the placebo effect: Recent advances and current thought. Annual Review of Psychology, 59, 565–590.

Rosenthal, R. (1976). Experimenter effects in behavioral research (enlarged ed.). New York, NY: Wiley.

Rosenthal, R., & Fode, K. (1963). The effect of experimenter bias on performance of the albino rat. Behavioral Science, 8, 183-189.

Rosenthal, R., & Rosnow, R. L. (1976). The volunteer subject. New York, NY: Wiley.

Shapiro, A. K., & Shapiro, E. (1999). The powerful placebo: From ancient priest to modern physician. Baltimore, MD: Johns Hopkins University Press.

Stanovich, K. E. (2010). How to think straight about psychology (9th ed.). Boston, MA: Allyn Bacon.

Research Methods in Psychology & Neuroscience Copyright © by Dalhousie University Introduction to Psychology and Neuroscience Team. All Rights Reserved.

Share This Book

Logo for M Libraries Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

10.3 The Single-Subject Versus Group “Debate”

Learning objectives.

  • Explain some of the points of disagreement between advocates of single-subject research and advocates of group research.
  • Identify several situations in which single-subject research would be appropriate and several others in which group research would be appropriate.

Single-subject research is similar to group research—especially experimental group research—in many ways. They are both quantitative approaches that try to establish causal relationships by manipulating an independent variable, measuring a dependent variable, and controlling extraneous variables. But there are important differences between these approaches too, and these differences sometimes lead to disagreements. It is worth addressing the most common points of disagreement between single-subject researchers and group researchers and how these disagreements can be resolved. As we will see, single-subject research and group research are probably best conceptualized as complementary approaches.

Data Analysis

One set of disagreements revolves around the issue of data analysis. Some advocates of group research worry that visual inspection is inadequate for deciding whether and to what extent a treatment has affected a dependent variable. One specific concern is that visual inspection is not sensitive enough to detect weak effects. A second is that visual inspection can be unreliable, with different researchers reaching different conclusions about the same set of data (Danov & Symons, 2008). A third is that the results of visual inspection—an overall judgment of whether or not a treatment was effective—cannot be clearly and efficiently summarized or compared across studies (unlike the measures of relationship strength typically used in group research).

In general, single-subject researchers share these concerns. However, they also argue that their use of the steady state strategy, combined with their focus on strong and consistent effects, minimizes most of them. If the effect of a treatment is difficult to detect by visual inspection because the effect is weak or the data are noisy, then single-subject researchers look for ways to increase the strength of the effect or reduce the noise in the data by controlling extraneous variables (e.g., by administering the treatment more consistently). If the effect is still difficult to detect, then they are likely to consider it neither strong enough nor consistent enough to be of further interest. Many single-subject researchers also point out that statistical analysis is becoming increasingly common and that many of them are using it as a supplement to visual inspection—especially for the purpose of comparing results across studies (Scruggs & Mastropieri, 2001).

Turning the tables, some advocates of single-subject research worry about the way that group researchers analyze their data. Specifically, they point out that focusing on group means can be highly misleading. Again, imagine that a treatment has a strong positive effect on half the people exposed to it and an equally strong negative effect on the other half. In a traditional between-subjects experiment, the positive effect on half the participants in the treatment condition would be statistically cancelled out by the negative effect on the other half. The mean for the treatment group would then be the same as the mean for the control group, making it seem as though the treatment had no effect when in fact it had a strong effect on every single participant!

But again, group researchers share this concern. Although they do focus on group statistics, they also emphasize the importance of examining distributions of individual scores. For example, if some participants were positively affected by a treatment and others negatively affected by it, this would produce a bimodal distribution of scores and could be detected by looking at a histogram of the data. The use of within-subjects designs is another strategy that allows group researchers to observe effects at the individual level and even to specify what percentage of individuals exhibit strong, medium, weak, and even negative effects.

External Validity

The second issue about which single-subject and group researchers sometimes disagree has to do with external validity—the ability to generalize the results of a study beyond the people and situation actually studied. In particular, advocates of group research point out the difficulty in knowing whether results for just a few participants are likely to generalize to others in the population. Imagine, for example, that in a single-subject study, a treatment has been shown to reduce self-injury for each of two developmentally disabled children. Even if the effect is strong for these two children, how can one know whether this treatment is likely to work for other developmentally disabled children?

Again, single-subject researchers share this concern. In response, they note that the strong and consistent effects they are typically interested in—even when observed in small samples—are likely to generalize to others in the population. Single-subject researchers also note that they place a strong emphasis on replicating their research results. When they observe an effect with a small sample of participants, they typically try to replicate it with another small sample—perhaps with a slightly different type of participant or under slightly different conditions. Each time they observe similar results, they rightfully become more confident in the generality of those results. Single-subject researchers can also point to the fact that the principles of classical and operant conditioning—most of which were discovered using the single-subject approach—have been successfully generalized across an incredibly wide range of species and situations.

And again turning the tables, single-subject researchers have concerns of their own about the external validity of group research. One extremely important point they make is that studying large groups of participants does not entirely solve the problem of generalizing to other individuals . Imagine, for example, a treatment that has been shown to have a small positive effect on average in a large group study. It is likely that although many participants exhibited a small positive effect, others exhibited a large positive effect, and still others exhibited a small negative effect. When it comes to applying this treatment to another large group , we can be fairly sure that it will have a small effect on average. But when it comes to applying this treatment to another individual , we cannot be sure whether it will have a small, a large, or even a negative effect. Another point that single-subject researchers make is that group researchers also face a similar problem when they study a single situation and then generalize their results to other situations. For example, researchers who conduct a study on the effect of cell phone use on drivers on a closed oval track probably want to apply their results to drivers in many other real-world driving situations. But notice that this requires generalizing from a single situation to a population of situations. Thus the ability to generalize is based on much more than just the sheer number of participants one has studied. It requires a careful consideration of the similarity of the participants and situations studied to the population of participants and situations that one wants to generalize to (Shadish, Cook, & Campbell, 2002).

Single-Subject and Group Research as Complementary Methods

As with quantitative and qualitative research, it is probably best to conceptualize single-subject research and group research as complementary methods that have different strengths and weaknesses and that are appropriate for answering different kinds of research questions (Kazdin, 1982). Single-subject research is particularly good for testing the effectiveness of treatments on individuals when the focus is on strong, consistent, and biologically or socially important effects. It is especially useful when the behavior of particular individuals is of interest. Clinicians who work with only one individual at a time may find that it is their only option for doing systematic quantitative research.

Group research, on the other hand, is good for testing the effectiveness of treatments at the group level. Among the advantages of this approach is that it allows researchers to detect weak effects, which can be of interest for many reasons. For example, finding a weak treatment effect might lead to refinements of the treatment that eventually produce a larger and more meaningful effect. Group research is also good for studying interactions between treatments and participant characteristics. For example, if a treatment is effective for those who are high in motivation to change and ineffective for those who are low in motivation to change, then a group design can detect this much more efficiently than a single-subject design. Group research is also necessary to answer questions that cannot be addressed using the single-subject approach, including questions about independent variables that cannot be manipulated (e.g., number of siblings, extroversion, culture).

Finally, it is important to understand that the single-subject and group approaches represent different research traditions. This factor is probably the most important one affecting which approach a researcher uses. Researchers in the experimental analysis of behavior and applied behavior analysis learn to conceptualize their research questions in ways that are amenable to the single-subject approach. Researchers in most other areas of psychology learn to conceptualize their research questions in ways that are amenable to the group approach. At the same time, there are many topics in psychology in which research from the two traditions have informed each other and been successfully integrated. One example is research suggesting that both animals and humans have an innate “number sense”—an awareness of how many objects or events of a particular type have they have experienced without actually having to count them (Dehaene, 2011). Single-subject research with rats and birds and group research with human infants have shown strikingly similar abilities in those populations to discriminate small numbers of objects and events. This number sense—which probably evolved long before humans did—may even be the foundation of humans’ advanced mathematical abilities.

Key Takeaways

  • Differences between single-subject research and group research sometimes lead to disagreements between single-subject and group researchers. These disagreements center on the issues of data analysis and external validity (especially generalization to other people).
  • Single-subject research and group research are probably best seen as complementary methods, with different strengths and weaknesses, that are appropriate for answering different kinds of research questions.
  • Discussion: Imagine you have conducted a single-subject study showing a positive effect of a treatment on the behavior of a man with social anxiety disorder. Your research has been criticized on the grounds that it cannot be generalized to others. How could you respond to this criticism?
  • Discussion: Imagine you have conducted a group study showing a positive effect of a treatment on the behavior of a group of people with social anxiety disorder, but your research has been criticized on the grounds that “average” effects cannot be generalized to individuals. How could you respond to this criticism?
  • Practice: Redesign as a group study the study by Hall and his colleagues described at the beginning of this chapter, and list the strengths and weaknesses of your new study compared with the original study.
  • Practice: The generation effect refers to the fact that people who generate information as they are learning it (e.g., by self-testing) recall it better later than do people who simply review information. Design a single-subject study on the generation effect applied to college students learning brain anatomy.

Danov, S. E., & Symons, F. E. (2008). A survey evaluation of the reliability of visual inspection and functional analysis graphs. Behavior Modification , 32 , 828–839.

Dehaene, S. (2011). The number sense: How the mind creates mathematics (2nd ed.). New York, NY: Oxford.

Kazdin, A. E. (1982). Single-case research designs: Methods for clinical and applied settings . New York, NY: Oxford University Press.

Scruggs, T. E., & Mastropieri, M. A. (2001). How to summarize single-participant research: Ideas and applications. Exceptionality , 9 , 227–244.

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference . Boston, MA: Houghton Mifflin.

Research Methods in Psychology Copyright © 2016 by University of Minnesota is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Chapter 6: Experimental Research

6.1 experiment basics, learning objectives.

  • Explain what an experiment is and recognize examples of studies that are experiments and studies that are not experiments.
  • Explain what internal validity is and why experiments are considered to be high in internal validity.
  • Explain what external validity is and evaluate studies in terms of their external validity.
  • Distinguish between the manipulation of the independent variable and control of extraneous variables and explain the importance of each.
  • Recognize examples of confounding variables and explain how they affect the internal validity of a study.

What Is an Experiment?

As we saw earlier in the book, an experiment is a type of study designed specifically to answer the question of whether there is a causal relationship between two variables. Do changes in an independent variable cause changes in a dependent variable? Experiments have two fundamental features. The first is that the researchers manipulate, or systematically vary, the level of the independent variable. The different levels of the independent variable are called conditions. For example, in Darley and Latané’s experiment, the independent variable was the number of witnesses that participants believed to be present. The researchers manipulated this independent variable by telling participants that there were either one, two, or five other students involved in the discussion, thereby creating three conditions. The second fundamental feature of an experiment is that the researcher controls, or minimizes the variability in, variables other than the independent and dependent variable. These other variables are called extraneous variables. Darley and Latané tested all their participants in the same room, exposed them to the same emergency situation, and so on. They also randomly assigned their participants to conditions so that the three groups would be similar to each other to begin with. Notice that although the words manipulation and control have similar meanings in everyday language, researchers make a clear distinction between them. They manipulate the independent variable by systematically changing its levels and control other variables by holding them constant.

Internal and External Validity

Internal validity.

Recall that the fact that two variables are statistically related does not necessarily mean that one causes the other. “Correlation does not imply causation.” For example, if it were the case that people who exercise regularly are happier than people who do not exercise regularly, this would not necessarily mean that exercising increases people’s happiness. It could mean instead that greater happiness causes people to exercise (the directionality problem) or that something like better physical health causes people to exercise and be happier (the third-variable problem).

The purpose of an experiment, however, is to show that two variables are statistically related and to do so in a way that supports the conclusion that the independent variable caused any observed differences in the dependent variable. The basic logic is this: If the researcher creates two or more highly similar conditions and then manipulates the independent variable to produce just one difference between them, then any later difference between the conditions must have been caused by the independent variable. For example, because the only difference between Darley and Latané’s conditions was the number of students that participants believed to be involved in the discussion, this must have been responsible for differences in helping between the conditions.

An empirical study is said to be high in internal validity if the way it was conducted supports the conclusion that the independent variable caused any observed differences in the dependent variable. Thus experiments are high in internal validity because the way they are conducted—with the manipulation of the independent variable and the control of extraneous variables—provides strong support for causal conclusions.

External Validity

At the same time, the way that experiments are conducted sometimes leads to a different kind of criticism. Specifically, the need to manipulate the independent variable and control extraneous variables means that experiments are often conducted under conditions that seem artificial or unlike “real life” (Stanovich, 2010). In many psychology experiments, the participants are all college undergraduates and come to a classroom or laboratory to fill out a series of paper-and-pencil questionnaires or to perform a carefully designed computerized task. Consider, for example, an experiment in which researcher Barbara Fredrickson and her colleagues had college students come to a laboratory on campus and complete a math test while wearing a swimsuit (Fredrickson, Roberts, Noll, Quinn, & Twenge, 1998). At first, this might seem silly. When will college students ever have to complete math tests in their swimsuits outside of this experiment?

The issue we are confronting is that of external validity. An empirical study is high in external validity if the way it was conducted supports generalizing the results to people and situations beyond those actually studied. As a general rule, studies are higher in external validity when the participants and the situation studied are similar to those that the researchers want to generalize to. Imagine, for example, that a group of researchers is interested in how shoppers in large grocery stores are affected by whether breakfast cereal is packaged in yellow or purple boxes. Their study would be high in external validity if they studied the decisions of ordinary people doing their weekly shopping in a real grocery store. If the shoppers bought much more cereal in purple boxes, the researchers would be fairly confident that this would be true for other shoppers in other stores. Their study would be relatively low in external validity, however, if they studied a sample of college students in a laboratory at a selective college who merely judged the appeal of various colors presented on a computer screen. If the students judged purple to be more appealing than yellow, the researchers would not be very confident that this is relevant to grocery shoppers’ cereal-buying decisions.

We should be careful, however, not to draw the blanket conclusion that experiments are low in external validity. One reason is that experiments need not seem artificial. Consider that Darley and Latané’s experiment provided a reasonably good simulation of a real emergency situation. Or consider field experiments that are conducted entirely outside the laboratory. In one such experiment, Robert Cialdini and his colleagues studied whether hotel guests choose to reuse their towels for a second day as opposed to having them washed as a way of conserving water and energy (Cialdini, 2005). These researchers manipulated the message on a card left in a large sample of hotel rooms. One version of the message emphasized showing respect for the environment, another emphasized that the hotel would donate a portion of their savings to an environmental cause, and a third emphasized that most hotel guests choose to reuse their towels. The result was that guests who received the message that most hotel guests choose to reuse their towels reused their own towels substantially more often than guests receiving either of the other two messages. Given the way they conducted their study, it seems very likely that their result would hold true for other guests in other hotels.

A second reason not to draw the blanket conclusion that experiments are low in external validity is that they are often conducted to learn about psychological processes that are likely to operate in a variety of people and situations. Let us return to the experiment by Fredrickson and colleagues. They found that the women in their study, but not the men, performed worse on the math test when they were wearing swimsuits. They argued that this was due to women’s greater tendency to objectify themselves—to think about themselves from the perspective of an outside observer—which diverts their attention away from other tasks. They argued, furthermore, that this process of self-objectification and its effect on attention is likely to operate in a variety of women and situations—even if none of them ever finds herself taking a math test in her swimsuit.

Manipulation of the Independent Variable

Again, to manipulate an independent variable means to change its level systematically so that different groups of participants are exposed to different levels of that variable, or the same group of participants is exposed to different levels at different times. For example, to see whether expressive writing affects people’s health, a researcher might instruct some participants to write about traumatic experiences and others to write about neutral experiences. The different levels of the independent variable are referred to as conditions , and researchers often give the conditions short descriptive names to make it easy to talk and write about them. In this case, the conditions might be called the “traumatic condition” and the “neutral condition.”

Notice that the manipulation of an independent variable must involve the active intervention of the researcher. Comparing groups of people who differ on the independent variable before the study begins is not the same as manipulating that variable. For example, a researcher who compares the health of people who already keep a journal with the health of people who do not keep a journal has not manipulated this variable and therefore not conducted an experiment. This is important because groups that already differ in one way at the beginning of a study are likely to differ in other ways too. For example, people who choose to keep journals might also be more conscientious, more introverted, or less stressed than people who do not. Therefore, any observed difference between the two groups in terms of their health might have been caused by whether or not they keep a journal, or it might have been caused by any of the other differences between people who do and do not keep journals. Thus the active manipulation of the independent variable is crucial for eliminating the third-variable problem.

Of course, there are many situations in which the independent variable cannot be manipulated for practical or ethical reasons and therefore an experiment is not possible. For example, whether or not people have a significant early illness experience cannot be manipulated, making it impossible to do an experiment on the effect of early illness experiences on the development of hypochondriasis. This does not mean it is impossible to study the relationship between early illness experiences and hypochondriasis—only that it must be done using nonexperimental approaches. We will discuss this in detail later in the book.

In many experiments, the independent variable is a construct that can only be manipulated indirectly. For example, a researcher might try to manipulate participants’ stress levels indirectly by telling some of them that they have five minutes to prepare a short speech that they will then have to give to an audience of other participants. In such situations, researchers often include a manipulation check in their procedure. A manipulation check is a separate measure of the construct the researcher is trying to manipulate. For example, researchers trying to manipulate participants’ stress levels might give them a paper-and-pencil stress questionnaire or take their blood pressure—perhaps right after the manipulation or at the end of the procedure—to verify that they successfully manipulated this variable.

Control of Extraneous Variables

An extraneous variable is anything that varies in the context of a study other than the independent and dependent variables. In an experiment on the effect of expressive writing on health, for example, extraneous variables would include participant variables (individual differences) such as their writing ability, their diet, and their shoe size. They would also include situation or task variables such as the time of day when participants write, whether they write by hand or on a computer, and the weather. Extraneous variables pose a problem because many of them are likely to have some effect on the dependent variable. For example, participants’ health will be affected by many things other than whether or not they engage in expressive writing. This can make it difficult to separate the effect of the independent variable from the effects of the extraneous variables, which is why it is important to control extraneous variables by holding them constant.

Extraneous Variables as “Noise”

Extraneous variables make it difficult to detect the effect of the independent variable in two ways. One is by adding variability or “noise” to the data. Imagine a simple experiment on the effect of mood (happy vs. sad) on the number of happy childhood events people are able to recall. Participants are put into a negative or positive mood (by showing them a happy or sad video clip) and then asked to recall as many happy childhood events as they can. The two leftmost columns of Table 6.1 “Hypothetical Noiseless Data and Realistic Noisy Data” show what the data might look like if there were no extraneous variables and the number of happy childhood events participants recalled was affected only by their moods. Every participant in the happy mood condition recalled exactly four happy childhood events, and every participant in the sad mood condition recalled exactly three. The effect of mood here is quite obvious. In reality, however, the data would probably look more like those in the two rightmost columns of Table 6.1 “Hypothetical Noiseless Data and Realistic Noisy Data” . Even in the happy mood condition, some participants would recall fewer happy memories because they have fewer to draw on, use less effective strategies, or are less motivated. And even in the sad mood condition, some participants would recall more happy childhood memories because they have more happy memories to draw on, they use more effective recall strategies, or they are more motivated. Although the mean difference between the two groups is the same as in the idealized data, this difference is much less obvious in the context of the greater variability in the data. Thus one reason researchers try to control extraneous variables is so their data look more like the idealized data in Table 6.1 “Hypothetical Noiseless Data and Realistic Noisy Data” , which makes the effect of the independent variable is easier to detect (although real data never look quite that good).

Table 6.1 Hypothetical Noiseless Data and Realistic Noisy Data

One way to control extraneous variables is to hold them constant. This can mean holding situation or task variables constant by testing all participants in the same location, giving them identical instructions, treating them in the same way, and so on. It can also mean holding participant variables constant. For example, many studies of language limit participants to right-handed people, who generally have their language areas isolated in their left cerebral hemispheres. Left-handed people are more likely to have their language areas isolated in their right cerebral hemispheres or distributed across both hemispheres, which can change the way they process language and thereby add noise to the data.

In principle, researchers can control extraneous variables by limiting participants to one very specific category of person, such as 20-year-old, straight, female, right-handed, sophomore psychology majors. The obvious downside to this approach is that it would lower the external validity of the study—in particular, the extent to which the results can be generalized beyond the people actually studied. For example, it might be unclear whether results obtained with a sample of younger straight women would apply to older gay men. In many situations, the advantages of a diverse sample outweigh the reduction in noise achieved by a homogeneous one.

Extraneous Variables as Confounding Variables

The second way that extraneous variables can make it difficult to detect the effect of the independent variable is by becoming confounding variables. A confounding variable is an extraneous variable that differs on average across levels of the independent variable. For example, in almost all experiments, participants’ intelligence quotients (IQs) will be an extraneous variable. But as long as there are participants with lower and higher IQs at each level of the independent variable so that the average IQ is roughly equal, then this variation is probably acceptable (and may even be desirable). What would be bad, however, would be for participants at one level of the independent variable to have substantially lower IQs on average and participants at another level to have substantially higher IQs on average. In this case, IQ would be a confounding variable.

To confound means to confuse, and this is exactly what confounding variables do. Because they differ across conditions—just like the independent variable—they provide an alternative explanation for any observed difference in the dependent variable. Figure 6.1 “Hypothetical Results From a Study on the Effect of Mood on Memory” shows the results of a hypothetical study, in which participants in a positive mood condition scored higher on a memory task than participants in a negative mood condition. But if IQ is a confounding variable—with participants in the positive mood condition having higher IQs on average than participants in the negative mood condition—then it is unclear whether it was the positive moods or the higher IQs that caused participants in the first condition to score higher. One way to avoid confounding variables is by holding extraneous variables constant. For example, one could prevent IQ from becoming a confounding variable by limiting participants only to those with IQs of exactly 100. But this approach is not always desirable for reasons we have already discussed. A second and much more general approach—random assignment to conditions—will be discussed in detail shortly.

Figure 6.1 Hypothetical Results From a Study on the Effect of Mood on Memory

Hypothetical Results From a Study on the Effect of Mood on Memory

Because IQ also differs across conditions, it is a confounding variable.

Key Takeaways

  • An experiment is a type of empirical study that features the manipulation of an independent variable, the measurement of a dependent variable, and control of extraneous variables.
  • Studies are high in internal validity to the extent that the way they are conducted supports the conclusion that the independent variable caused any observed differences in the dependent variable. Experiments are generally high in internal validity because of the manipulation of the independent variable and control of extraneous variables.
  • Studies are high in external validity to the extent that the result can be generalized to people and situations beyond those actually studied. Although experiments can seem “artificial”—and low in external validity—it is important to consider whether the psychological processes under study are likely to operate in other people and situations.
  • Practice: List five variables that can be manipulated by the researcher in an experiment. List five variables that cannot be manipulated by the researcher in an experiment.

Practice: For each of the following topics, decide whether that topic could be studied using an experimental research design and explain why or why not.

  • Effect of parietal lobe damage on people’s ability to do basic arithmetic.
  • Effect of being clinically depressed on the number of close friendships people have.
  • Effect of group training on the social skills of teenagers with Asperger’s syndrome.
  • Effect of paying people to take an IQ test on their performance on that test.

Cialdini, R. (2005, April). Don’t throw in the towel: Use social influence research. APS Observer . Retrieved from http://www.psychologicalscience.org/observer/getArticle.cfm?id=1762 .

Fredrickson, B. L., Roberts, T.-A., Noll, S. M., Quinn, D. M., & Twenge, J. M. (1998). The swimsuit becomes you: Sex differences in self-objectification, restrained eating, and math performance. Journal of Personality and Social Psychology, 75 , 269–284.

Stanovich, K. E. (2010). How to think straight about psychology (9th ed.). Boston, MA: Allyn & Bacon.

  • Research Methods in Psychology. Provided by : University of Minnesota Libraries Publishing. Located at : http://open.lib.umn.edu/psychologyresearchmethods . License : CC BY-NC-SA: Attribution-NonCommercial-ShareAlike

Footer Logo Lumen Candela

Privacy Policy

IMAGES

  1. 4. Setup for Single Participant Experiment and Co-participant Pair

    unlike the case study the single participant experiment ____

  2. An overview of the single-case study approach

    unlike the case study the single participant experiment ____

  3. PPT

    unlike the case study the single participant experiment ____

  4. Difference Between Case Study and Experiment

    unlike the case study the single participant experiment ____

  5. Single Participant Experiment

    unlike the case study the single participant experiment ____

  6. The study procedure following single-case experimental design

    unlike the case study the single participant experiment ____

VIDEO

  1. Correlation: Comparing theory with experiment (U1-9-04)

  2. Difference between observational studies and randomized experiments?

  3. دراسة الحالة Case report / case study

  4. Independent & dependent variables and controlled experiments

  5. BAR Lab Experiment

  6. Research Methods: The Experiment

COMMENTS

  1. PSYCH2050 Chapter 4 Flashcards

    Emily's answer constitutes a(n) ____., If the results of a study are due to factors other than those included in the research investigation, the study is said to have ____. and more. ... Unlike the case study, the single-participant experiment ____. cannot determine cause-and-effect relationships.

  2. Single-Case Experimental Designs: A Systematic Review of Published

    The single-case experiment has a storied history in psychology dating back to the field's founders: Fechner (1889), Watson (1925), and Skinner (1938).It has been used to inform and develop theory, examine interpersonal processes, study the behavior of organisms, establish the effectiveness of psychological interventions, and address a host of other research questions (for a review, see ...

  3. Psyc 340 Chapter 14 Flashcards

    Another name for single-case study is. ... Unlike single-case experiments, case studies usually involve uncontrolled impressionistic descriptions rather than controlled experimentation ... but each participant's data is analyzed separately and data is rarely averaged. Critics say that group experimental designs fail to adequately handle three ...

  4. Single-Case Experimental Design

    Unlike traditional group-based designs, this approach allows researchers to closely study and understand the nuances of a single participant's behavior, responses, and reactions over time. The precision and depth of insight offered by single-case experimental design have made it an invaluable tool in the field of psychology, facilitating both ...

  5. Single Subject Research

    An added benefit of this design, and all single-case designs, is the immediacy of the data. Instead of waiting until postintervention to take measures on the behavior, single-case research prescribes continuous data collection and visual monitoring of that data displayed graphically, allowing for immediate instructional decision-making.

  6. Case Study vs. Single-Case Experimental Designs

    One of the key differences between case studies and single-case experimental designs is their generalizability. Case studies are often conducted on unique or rare cases, making it challenging to generalize the findings to a larger population. The focus of case studies is on providing detailed insights into specific cases rather than making ...

  7. 10.5: Single-Subject Research (Summary)

    Key Takeaways. Single-subject research—which involves testing a small number of participants and focusing intensively on the behavior of each individual—is an important alternative to group research in psychology. Single-subject studies must be distinguished from qualitative research on a single person or small number of individuals.

  8. PDF Chapter 14. Experimental Designs: Single-Subject Designs and Time

    any conclusions that can be drawn. There are two serious problems with the case-study approach: (1) lack of experimental control, and (2) obtaining precise measures of behavior. Neither of these problems applies to the single-subject approach. The method is relatively popular today but it hasn't always been. Research in psychology started out

  9. 6.2 Experimental Design

    Random assignment is a method for assigning participants in a sample to the different conditions, and it is an important element of all experimental research in psychology and other fields too. In its strictest sense, random assignment should meet two criteria. One is that each participant has an equal chance of being assigned to each condition ...

  10. Single case studies are a powerful tool for developing ...

    In sum, the single case study approach allows for detailed testing within a participant, thereby avoiding the concern of 'averaging away' theoretically important differences.

  11. Single-Participant Research Designs

    Single-participant research designs involve the intensive study of one participant continuously or repeatedly across time. The participant may be a person or a single molar unit, such as an industrial organization or a political unit. As with conventional group designs based on reasonably large samples of subjects, single-participant designs ...

  12. Single Case Designs in Psychology Practice

    In single case designs, the systematic monitoring and evaluation positions the participant/client to adopt a problem-solving experiment, conjointly with the practitioner/field researcher on herself/himself. In so doing, the findings are immediate and directly applicable to their situation. 1, 4 In some cases, these approaches promote the ...

  13. PSYC 1300 Chapter 2 Flashcards

    Study with Quizlet and memorize flashcards containing terms like Unlike experimental research, correlational research cannot _____, A(n) _____ study is an in-depth analysis of an individual or small group of people., Research that involves determining the association between two variables or two sets of variables is called _____ research. and more.

  14. Single-Case Designs

    Single-case design (SCD), also known as single-subject design, single-case experimental design, or N-of-1 trials, refers to a research methodology that involves examining the effect of an intervention on an individual or on each of multiple individuals. Unlike case studies, SCDs involve the systematic manipulation of an independent variable (IV ...

  15. 6.3 Conducting Experiments

    In this case, with each trip down the stairs, the confederate was instructed to gaze at the first person he encountered who appeared to be between the ages of 20 and 50. Only if the person gazed back did he or she become a participant in the study. The point of having a well-defined selection rule is to avoid bias in the selection of participants.

  16. 10.3 The Single-Subject Versus Group "Debate"

    Single-subject research with rats and birds and group research with human infants have shown strikingly similar abilities in those populations to discriminate small numbers of objects and events. This number sense—which probably evolved long before humans did—may even be the foundation of humans' advanced mathematical abilities.

  17. Solved The case study and the single-participant experiment

    The case study and the single-participant experiment are two examples of ____. Group of answer choices. experimental studies. epidemiological research. the idiographic-orient. There are 3 steps to solve this one. Expert-verified. Share Share.

  18. Research Methods: Field Research

    Case Study. Sometimes a researcher wants to study one specific person or event. A case study is an in-depth analysis of a single event, situation, or individual. To conduct a case study, a researcher examines existing sources like documents and archival records, conducts interviews, or engages in direct observation and even participant ...

  19. Unlike the Case Study, the Single-Participant Experiment

    Unlike the case study, the single-participant experiment ____. A) cannot determine cause-and-effect relationships B) is based on the nomothetic research orientation C) is better able to assess cause-and-effect relationships D) is a correlational design

  20. Chapter 9: Simple Experiments

    Experiments have two fundamental features. The first is that the researchers manipulate, or systematically vary, the level of the independent variable. The different levels of the independent variable are called conditions. For example, in Darley and Latané's experiment, the independent variable was the number of witnesses that participants ...

  21. 10.3: The Single-Subject Versus Group "Debate"

    Single-Subject and Group Research as Complementary Methods. As with quantitative and qualitative research, it is probably best to conceptualize single-subject research and group research as complementary methods that have different strengths and weaknesses and that are appropriate for answering different kinds of research questions (Kazdin ...

  22. 10.3 The Single-Subject Versus Group "Debate"

    Single-subject research with rats and birds and group research with human infants have shown strikingly similar abilities in those populations to discriminate small numbers of objects and events. This number sense—which probably evolved long before humans did—may even be the foundation of humans' advanced mathematical abilities.

  23. 6.1 Experiment Basics

    Experiments have two fundamental features. The first is that the researchers manipulate, or systematically vary, the level of the independent variable. The different levels of the independent variable are called conditions. For example, in Darley and Latané's experiment, the independent variable was the number of witnesses that participants ...