• Privacy Policy

Research Method

Home » Inter-Rater Reliability – Methods, Examples and Formulas

Inter-Rater Reliability – Methods, Examples and Formulas

Table of Contents

Inter-Rater Reliability

Inter-Rater Reliability

Definition:

Inter-rater reliability refers to the degree of agreement or consistency among different raters or observers when they independently assess or evaluate the same phenomenon, such as coding data, scoring tests, or rating behaviors. It is a measure of how reliable or consistent the judgments or ratings of multiple raters are.

Inter-rater reliability is particularly important in research studies, where multiple observers are often involved in data collection or evaluation. By assessing inter-rater reliability, researchers can determine the extent to which different raters agree on their judgments, which helps establish the validity and credibility of the data or measurements.

Also see Reliability

Inter-Rater Reliability Methods

There are several methods commonly used to assess inter-rater reliability. The choice of method depends on the nature of the data and the specific circumstances of the study. Here are some commonly used inter-rater reliability methods:

Cohen’s Kappa Coefficient

Cohen’s kappa is a widely used measure for categorical or nominal data. It takes into account both the agreement observed among raters and the agreement that could occur by chance. Kappa values range from -1 to 1, with values greater than 0 indicating agreement beyond chance.

Intraclass Correlation Coefficient (ICC)

The ICC is a popular measure for continuous or interval-level data. It quantifies the proportion of total variance in the ratings that is due to differences between subjects, as well as the proportion due to differences between raters. ICC values range from 0 to 1, with higher values indicating greater agreement among raters.

Fleiss’ Kappa

Fleiss’ kappa is an extension of Cohen’s kappa for situations involving multiple raters and more than two categories. It is commonly used when there are three or more raters providing categorical ratings for multiple subjects.

Pearson’s Correlation Coefficient

Pearson’s correlation coefficient assesses the linear relationship between two continuous variables. In the context of inter-rater reliability, it can be used to measure the degree of agreement between the ratings assigned by different raters.

Percentage Agreement

This simple method calculates the proportion of agreements between raters out of the total number of ratings. It is often used for categorical data or when the number of categories is small.

Gwet’s AC1

Gwet’s AC1 is an alternative to Cohen’s kappa that addresses some of its limitations, particularly when dealing with imbalanced data or when the prevalence of the categories is low. It is suitable for categorical data with two or more raters.

Kendall’s W

Kendall’s W is a measure of agreement for ordinal data. It assesses the extent to which the rankings assigned by different raters agree with each other.

Inter-Rater Reliability Formulas

Here are the formulas for some commonly used inter-rater reliability coefficients:

Cohen’s Kappa (κ):

κ = (Po – Pe) / (1 – Pe)

  • Po is the observed proportion of agreement among raters.
  • Pe is the proportion of agreement expected by chance.

……………………………………………….

Intraclass Correlation Coefficient (ICC):

ICC = (MSB – MSW) / (MSB + (k – 1) * MSW)

  • MSB is the mean square between raters (variance due to differences between raters).
  • MSW is the mean square within raters (variance within raters).

…………………………………………..

Fleiss’ Kappa (κ):

κ = (P – Pe) / (1 – Pe)

  • P is the observed proportion of agreement among raters.

……………………..

Pearson’s Correlation Coefficient (r):

r = (Σ((X – X̄)(Y – Ȳ))) / (√(Σ(X – X̄)^2) * √(Σ(Y – Ȳ)^2))

  • X and Y are the ratings assigned by different raters.
  • X̄ and Ȳ are the means of the ratings assigned by different raters.

……………………………….

Percentage Agreement:

  • Percentage Agreement = (Number of agreements) / (Total number of ratings) * 100

…………………………………….

Gwet’s AC1:

AC1 = (Po – Pe) / (1 – Pe)

…………………………………

Kendall’s W:

W = (Nc – Nd) / (Nc + Nd)

  • Nc is the number of concordant pairs (agreements) between raters.
  • Nd is the number of discordant pairs (disagreements) between raters.

Inter-Rater Reliability Applications

Inter-rater reliability has various applications in research, assessments, and evaluations. Here are some common areas where inter-rater reliability is important:

  • Research Studies: Inter-rater reliability is crucial in research studies that involve multiple observers or raters. It ensures that different researchers or assessors are consistent in their judgments, ratings, or measurements. This is essential for establishing the validity and reliability of the data collected, and for ensuring that the results are not biased by individual raters.
  • Behavioral Observations: Inter-rater reliability is often assessed in studies that involve behavioral observations, such as coding behaviors in psychology, animal behavior studies, or social science research. Different observers independently rate or record behaviors, and inter-rater reliability ensures that their assessments are consistent, enhancing the accuracy of the findings.
  • Medical and Clinical Assessments: Inter-rater reliability is critical in medical and clinical settings where multiple healthcare professionals or experts assess patients, interpret diagnostic tests, or rate symptoms. Consistency among raters is important for making accurate diagnoses, determining treatment plans, and evaluating patient progress.
  • Performance Evaluations: In educational or workplace settings, inter-rater reliability is relevant for performance evaluations, grading, or scoring assessments. Multiple teachers, instructors, or supervisors may independently assess students or employees, and inter-rater reliability ensures fairness and consistency in the evaluation process.
  • Coding and Content Analysis: Inter-rater reliability is essential in qualitative research, especially when coding textual data or conducting content analysis. Multiple researchers independently code or categorize data, and inter-rater reliability helps establish the consistency of their interpretations and ensures the reliability of qualitative findings.
  • Standardized Testing: Inter-rater reliability is critical in standardized testing situations, such as scoring essay responses, open-ended questions, or performance-based assessments. Different examiners or scorers should agree on the scores assigned to ensure fairness and reliability in the assessment process.
  • Psychometrics and Scale Development: When developing new measurement scales or questionnaires, inter-rater reliability is assessed to determine the consistency of ratings assigned by different raters. This step ensures that the scale measures the intended constructs reliably and that the instrument can be used with confidence in future research or assessments.

Inter-Rater Reliability Examples

Here are a few examples that illustrate the application of inter-rater reliability in different contexts:

  • Behavioral Coding: In a study on child behavior, researchers want to assess the inter-rater reliability of two trained observers who independently code and categorize specific behaviors exhibited during play sessions. They record and compare their coding decisions to determine the level of agreement between the raters. This helps ensure that the behaviors are consistently and reliably classified, enhancing the credibility of the study.
  • Clinical Assessments: In a medical setting, multiple doctors independently review the same set of patient medical records to diagnose a specific condition. Inter-rater reliability is assessed by comparing their diagnoses to determine the degree of agreement. This process helps ensure consistent and reliable diagnoses, reducing the risk of misdiagnosis or subjective variations among practitioners.
  • Performance Evaluation: In an educational institution, a group of teachers assesses student presentations using a standardized rubric. Inter-rater reliability is calculated by comparing their ratings to determine the level of agreement. This evaluation process ensures fairness and consistency in grading, providing students with reliable feedback on their performance.
  • Scale Development: Researchers are developing a new questionnaire to measure job satisfaction. They ask a group of experts to independently rate a set of sample responses provided by employees. Inter-rater reliability is assessed to determine the level of agreement between the experts in assigning scores to the responses. This helps establish the reliability of the new questionnaire and ensures consistency in measuring job satisfaction.
  • Image Analysis: In a research study involving medical imaging, multiple radiologists independently analyze and interpret the same set of images to identify abnormalities or diagnose diseases. Inter-rater reliability is assessed by comparing their interpretations to determine the level of agreement. This analysis helps establish the consistency and reliability of the radiologists’ diagnoses, ensuring accurate patient assessments.

Advantages of Inter-Rater Reliability

Inter-rater reliability offers several advantages in research, assessments, and evaluations. Here are some key benefits:

  • Ensures Consistency: Inter-rater reliability ensures that different observers or raters are consistent in their judgments, ratings, or measurements. It helps reduce the potential for subjective biases or variations among raters, enhancing the reliability and objectivity of the data collected or assessments conducted.
  • Establishes Validity: By assessing inter-rater reliability, researchers can establish the validity of their measurements or observations. Consistent agreement among raters indicates that the measurement instrument or observation protocol is reliable and accurately captures the intended constructs or phenomena under study.
  • Increases Credibility: Inter-rater reliability enhances the credibility and trustworthiness of research findings or assessment results. When multiple raters independently produce consistent results, it strengthens the confidence in the data or evaluations, making the conclusions more robust and reliable.
  • Identifies Rater Biases: Assessing inter-rater reliability helps identify and address potential biases among raters. If there is low agreement or consistency among raters, it suggests the presence of factors influencing their judgments differently. This awareness allows researchers or evaluators to investigate and mitigate sources of bias, improving the overall quality of the assessments or measurements.
  • Quality Control: Inter-rater reliability serves as a quality control measure in data collection, assessments, or evaluations. It ensures that the process is standardized and that the data or assessments are conducted consistently across multiple raters. This enhances the reliability and comparability of the results obtained.
  • Supports Generalizability: Inter-rater reliability contributes to the generalizability of research findings or assessment outcomes. When multiple raters consistently produce similar results, it increases the likelihood that the findings can be generalized to a larger population or that the assessments can be applied in various contexts.
  • Facilitates Training and Calibration: Assessing inter-rater reliability can identify areas where additional training or calibration is needed among raters. It helps improve the consistency and agreement among raters through targeted training sessions, clearer guidelines, or revisions to measurement instruments. This leads to higher quality data and more reliable assessments.

Limitations of Inter-Rater Reliability

While inter-rater reliability is a valuable measure, it is important to be aware of its limitations. Here are some limitations associated with inter-rater reliability:

  • Subjectivity of Raters: Inter-rater reliability is influenced by the subjective judgments of individual raters. Different raters may have different interpretations, biases, or levels of expertise, which can affect their agreement. In some cases, subjective judgments may introduce variability and lower inter-rater reliability.
  • Lack of Objective Criteria: The reliability of judgments or ratings depends on the availability of clear and objective criteria or guidelines. If the criteria are ambiguous or open to interpretation, it can lead to disagreements among raters and lower inter-rater reliability. It is crucial to provide specific and well-defined criteria to minimize subjectivity.
  • Small Sample Sizes: In studies or assessments with a small number of observations or ratings, inter-rater reliability estimates may be less stable. With fewer instances of agreement or disagreement, the reliability coefficient can be more sensitive to variations, leading to less reliable estimates.
  • Variability in the Phenomenon: Inter-rater reliability assumes that the phenomenon being assessed is stable and consistent. However, if the phenomenon itself is inherently variable or prone to change, it can impact inter-rater reliability. For example, subjective ratings of complex human behaviors may show lower agreement due to the multifaceted nature of the behaviors.
  • Limited to the Specific Context: Inter-rater reliability is context-specific and may not generalize to other settings or populations. The agreement among raters may vary depending on the characteristics of the participants, the nature of the measurements, or the specific circumstances of the study. Caution should be exercised when applying inter-rater reliability estimates beyond the original context.
  • Does Not Capture Accuracy: Inter-rater reliability assesses the consistency or agreement among raters but does not necessarily measure accuracy. Raters may consistently agree with each other, but their judgments may be consistently inaccurate. It is important to consider both reliability and validity measures to ensure the accuracy of assessments or measurements.
  • Limited to Agreement: Inter-rater reliability focuses on the level of agreement among raters but may not capture other important aspects, such as the magnitude or severity of a phenomenon. It may not provide a complete picture of the data or allow for nuanced interpretations.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Validity

Validity – Types, Examples and Guide

Alternate Forms Reliability

Alternate Forms Reliability – Methods, Examples...

Construct Validity

Construct Validity – Types, Threats and Examples

Internal Validity

Internal Validity – Threats, Examples and Guide

Reliability Vs Validity

Reliability Vs Validity

Internal_Consistency_Reliability

Internal Consistency Reliability – Methods...

Inter-rater Reliability

  • Reference work entry
  • Cite this reference work entry

inter rater reliability qualitative research equation

  • Rael T. Lange 5  

10k Accesses

16 Citations

Concordance ; Inter-observer reliability ; Inter-rater agreement ; Scorer reliability

Inter-rater reliability is the extent to which two or more raters (or observers, coders, examiners) agree. It addresses the issue of consistency of the implementation of a rating system. Inter-rater reliability can be evaluated by using a number of different statistics. Some of the more common statistics include: percentage agreement, kappa, product–moment correlation, and intraclass correlation coefficient. High inter-rater reliability values refer to a high degree of agreement between two examiners. Low inter-rater reliability values refer to a low degree of agreement between two examiners. Examples of the use of inter-rater reliability in neuropsychology include (a) the evaluation of the consistency of clinician’s neuropsychological diagnoses, (b) the evaluation of scoring parameters on drawing tasks such as the Rey Complex Figure Test or Visual Reproduction subtest, and (c) the...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References and Readings

Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.) Upper Saddle River, NJ: Prentice Hall.

Google Scholar  

Download references

Author information

Authors and affiliations.

British Columbia Mental Health and Addiction Services University of British Columbia, PHSA Research and Networks, Suite 201, 601 West Broadway, V5Z 4C2, Vancouver, BC, Canada

Rael T. Lange

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations.

Physical Medicine and Rehabilitation, and Professor of Neurosurgery, and Psychiatry Virginia Commonwealth University – Medical Center Department of Physical Medicine and Rehabilitation, VCU, 980542, Richmond, Virginia, 23298-0542, USA

Jeffrey S. Kreutzer

Kessler Foundation Research Center, 1199 Pleasant Valley Way, West Orange, NJ, 07052, USA

John DeLuca

Professor of Physical Medicine and Rehabilitation, and Neurology and Neuroscience, University of Medicine and Dentistry of New Jersey – New Jersey Medical School, New Jersey, USA

Independent Practice, 564 M.O.B. East, 100 E. Lancaster Ave., Wynnewood, PA, 19096, USA

Bruce Caplan

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry.

Lange, R.T. (2011). Inter-rater Reliability. In: Kreutzer, J.S., DeLuca, J., Caplan, B. (eds) Encyclopedia of Clinical Neuropsychology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-79948-3_1203

Download citation

DOI : https://doi.org/10.1007/978-0-387-79948-3_1203

Publisher Name : Springer, New York, NY

Print ISBN : 978-0-387-79947-6

Online ISBN : 978-0-387-79948-3

eBook Packages : Behavioral Science

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Experts@Minnesota Logo

Quantified Qualitative Analysis: Rubric Development and Inter-rater Reliability as Iterative Design

Research output : Chapter in Book/Report/Conference proceeding › Conference contribution

The objective in the current paper is to examine the processes of how our research team negotiated meaning using an iterative design approach as we established, developed, and refined a rubric to capture comprehension processes and strategies evident in students’ verbal protocols. The overarching project comprises multiple data sets, multiple scientists across (distant) institutions, and multiple teams of discourse analysts who are tasked with scoring over 20,000 verbal protocols (i.e., think aloud, self-explanation) collected in studies conducted in the last decade. Here, we describe the iterative modifications, negotiations, and realizations while coding our first subset comprising 7,559 individual verbal protocols. Drawing upon work in design research, we describe a process through which the research team has negotiated meaning around theory-driven codes and how this work has influenced our own ways of conceptualizing comprehension research, theory, and practice.

Publication series

Bibliographical note, other files and links.

  • Link to publication in Scopus

Fingerprint

  • Inter-rater Reliability Keyphrases 100%
  • Iterative Design Keyphrases 100%
  • Verbal Protocols Keyphrases 100%
  • Rubric Development Keyphrases 100%
  • Qualitative Method Psychology 100%
  • Realization Psychology 100%
  • Negotiation of Meaning Keyphrases 66%
  • Design Research Keyphrases 33%

T1 - Quantified Qualitative Analysis

T2 - 15th International Conference of the Learning Sciences, ICLS 2021

AU - McCarthy, Kathryn S.

AU - Magliano, Joseph P.

AU - Snyder, Jacob O.

AU - Kenney, Elizabeth A.

AU - Newton, Natalie N.

AU - Perret, Cecile A.

AU - Knezevic, Melanie

AU - Allen, Laura K.

AU - McNamara, Danielle S.

N1 - Publisher Copyright: © ISLS.

N2 - The objective in the current paper is to examine the processes of how our research team negotiated meaning using an iterative design approach as we established, developed, and refined a rubric to capture comprehension processes and strategies evident in students’ verbal protocols. The overarching project comprises multiple data sets, multiple scientists across (distant) institutions, and multiple teams of discourse analysts who are tasked with scoring over 20,000 verbal protocols (i.e., think aloud, self-explanation) collected in studies conducted in the last decade. Here, we describe the iterative modifications, negotiations, and realizations while coding our first subset comprising 7,559 individual verbal protocols. Drawing upon work in design research, we describe a process through which the research team has negotiated meaning around theory-driven codes and how this work has influenced our own ways of conceptualizing comprehension research, theory, and practice.

AB - The objective in the current paper is to examine the processes of how our research team negotiated meaning using an iterative design approach as we established, developed, and refined a rubric to capture comprehension processes and strategies evident in students’ verbal protocols. The overarching project comprises multiple data sets, multiple scientists across (distant) institutions, and multiple teams of discourse analysts who are tasked with scoring over 20,000 verbal protocols (i.e., think aloud, self-explanation) collected in studies conducted in the last decade. Here, we describe the iterative modifications, negotiations, and realizations while coding our first subset comprising 7,559 individual verbal protocols. Drawing upon work in design research, we describe a process through which the research team has negotiated meaning around theory-driven codes and how this work has influenced our own ways of conceptualizing comprehension research, theory, and practice.

UR - http://www.scopus.com/inward/record.url?scp=85164739901&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85164739901&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85164739901

T3 - Proceedings of International Conference of the Learning Sciences, ICLS

BT - ISLS Annual Meeting 2021 Reflecting the Past and Embracing the Future - 15th International Conference of the Learning Sciences, ICLS 2021

A2 - de Vries, Erica

A2 - Hod, Yotam

A2 - Ahn, June

PB - International Society of the Learning Sciences (ISLS)

Y2 - 8 June 2021 through 11 June 2021

IMAGES

  1. Inter-Rater Reliability

    inter rater reliability qualitative research equation

  2. How To Calculate Inter Rater Reliability : Interpretation of the icc as

    inter rater reliability qualitative research equation

  3. [PDF] Evaluation of Inter-Rater Agreement and Inter-Rater Reliability

    inter rater reliability qualitative research equation

  4. PPT

    inter rater reliability qualitative research equation

  5. PPT

    inter rater reliability qualitative research equation

  6. Inter-rater reliability of ALSFRS-R total score and domain scores

    inter rater reliability qualitative research equation

VIDEO

  1. Measurements, Formative & Reflective, Reliability, Validity and Structural Equation Modeling (SEM)

  2. QUANTITATIVE METHODOLOGY (Part 2 of 3):

  3. A Guide to the Kuder-Richardson Reliability Test

  4. Does AI really help you to write an academic paper?

  5. Using AgreeStat 360 to analyze your inter-rater reliability data. Compute Cohen's Kappa, Gwet AC1

  6. تعیین ضریب توافق بین کدگذاران (ضریب کاپای کوهن_ بخش اول)

COMMENTS

  1. Inter-Rater Reliability Methods in Qualitative Case Study Research

    The use of inter-rater reliability (IRR) methods may provide an opportunity to improve the transparency and consistency of qualitative case study data analysis in terms of the rigor of how codes and constructs have been developed from the raw data. Few articles on qualitative research methods in the literature conduct IRR assessments or neglect ...

  2. Inter-Rater Reliability

    Definition: Inter-rater reliability refers to the degree of agreement or consistency among different raters or observers when they independently assess or evaluate the same phenomenon, such as coding data, scoring tests, or rating behaviors. It is a measure of how reliable or consistent the judgments or ratings of multiple raters are.

  3. Reliability and Inter-rater Reliability in Qualitative Research: Norms

    Reliability and Inter-rater Reliability in Qualitative Research 72:3 Proceedings of the ACM on Human-Computer Interaction, No. CSCW, Article 72. Publication date: November 2019. Guidelines for deciding when agreement and/or IRR is not desirable (and may even be harmful): The decision not to use agreement or IRR is associated with the use of methods

  4. Computing Inter-Rater Reliability for Observational Data: An Overview

    Many research designs require the assessment of inter-rater reliability (IRR) to demonstrate consistency among observational ratings provided by multiple coders. However, many studies use incorrect statistical procedures, fail to fully report the information necessary to interpret their results, or do not address how IRR affects the power of ...

  5. Inter-Rater Reliability Methods in Qualitative Case Study Research

    Abstract. The use of inter-rater reliability (IRR) methods may provide an opportunity to improve the transparency and consistency of qualitative case study data analysis in terms of the rigor of ...

  6. Reliability and Inter-rater Reliability in Qualitative Research: Norms

    We reflect on current practices and propose guidelines for reporting on reliability in qualitative research using IRR as a central example of a form of agreement. The guidelines are designed to generate discussion and orient new CSCW and HCI scholars and reviewers to reliability in qualitative research.

  7. Inter-rater Reliability

    Inter-rater reliability is the extent to which two or more raters (or observers, coders, examiners) agree. It addresses the issue of consistency of the implementation of a rating system. Inter-rater reliability can be evaluated by using a number of different statistics. Some of the more common statistics include: percentage agreement, kappa ...

  8. Full article: The use of intercoder reliability in qualitative

    1. Introduction. Qualitative interview is an important method in science education research because it can be used to explore students' understanding of scientific concepts (Cheung and Winterbottom Citation 2021; Tai; Citation Forthcoming) and teachers' knowledge for teaching science in an in-depth manner.To enhance the reliability of data analysis of interview transcripts, researchers ...

  9. Qualitative Research: an Empirical Study

    argue that assessing inter-rater reliability is an important method for ensuring. rigour, others that it is unimportant; and yet it has never been formally examined in. an empirical qualitative study. Accordingly, to explore the degree of inter-rater. reliability that might be expected, six researchers were asked to identify themes in the same ...

  10. Interrater reliability: the kappa statistic

    Measurement of interrater reliability. There are a number of statistics that have been used to measure interrater and intrarater reliability. A partial list includes percent agreement, Cohen's kappa (for two raters), the Fleiss kappa (adaptation of Cohen's kappa for 3 or more raters) the contingency coefficient, the Pearson r and the Spearman Rho, the intra-class correlation coefficient ...

  11. Qualitative Coding: An Approach to Assess Inter-Rater Reliability

    When using qualitative coding techniques, establishing inter-rater reliability (IRR) is a recognized process of determining the trustworthiness of the study. However, the process of manually ...

  12. Interrater agreement and interrater reliability: Key concepts

    A process for establishing and maintaining inter-rater reliability for two observation instruments as a fidelity of implementation measure: A large-scale randomized controlled trial perspective Studies in Educational Evaluation, Volume 62, 2019, pp. 18-29

  13. Reliability and Inter-rater Reliability in Qualitative Research

    The Place of Inter-Rater Reliability in Qualitative Research: An Empirical Study. D. Armstrong A. Gosling J. Weinman T. Marteau. Sociology. 1997. Assessing inter-rater reliability, whereby data are independently coded and the codings compared for agreements, is a recognised process in quantitative research.

  14. Intercoder Reliability in Qualitative Research: Debates and Practical

    Evaluating the intercoder reliability (ICR) of a coding frame is frequently recommended as good practice in qualitative analysis. ICR is a somewhat controversial topic in the qualitative research community, with some arguing that it is an inappropriate or unnecessary step within the goals of qualitative analysis.

  15. PDF Investigating Inter-rater Reliability of Qualitative Text Annotations

    We measure inter-rater reliability of the annotations using four variations of Krippendorff's U-alpha. Based on the results we propose suggestions to designers on measuring reliability of qualitative annotations for machine learning datasets. Keywords: artificial intelligence (AI), big data analysis, qualitative annotations, design methods. 1.

  16. Quantified Qualitative Analysis: Rubric Development and Inter-rater

    McCarthy, KS, Magliano, JP, Snyder, JO, Kenney, EA, Newton, NN, Perret, CA, Knezevic, M, Allen, LK & McNamara, DS 2021, Quantified Qualitative Analysis: Rubric Development and Inter-rater Reliability as Iterative Design. in E de Vries, Y Hod & J Ahn (eds), ISLS Annual Meeting 2021 Reflecting the Past and Embracing the Future - 15th International Conference of the Learning Sciences, ICLS 2021.

  17. Inter-Rater Reliability Methods in Qualitative Case Study Research

    The use of inter-rater reliability (IRR) methods may provide an opportunity to improve the transparency and consistency of qualitative case study data analysis in terms of the rigor of how codes and constructs have been developed from the raw data. Few articles on qualitative research methods in the literature conduct IRR assessments or neglect to report them, despite some disclosure of ...

  18. Applying Inter-Rater Reliability and Agreement in collaborative

    Inter-Rater Reliability (IRR) and/or Inter-Rater Agreement (IRA) are commonly used techniques to measure consensus, and thus develop a shared interpretation. ... Reliability and inter-rater reliability in qualitative research: Norms and guidelines for CSCW and HCI practice. Proc. ACM Hum.-Comput. Interact., 3 (CSCW) (2019), 10.1145/3359174.