• Pharmaceutical Engineering Magazine
  • Online Exclusives
  • Special Reports

Facilities & Equipment

Information systems, product development.

  • Production Systems
  • Quality Systems
  • Regulatory Compliance
  • Research + Development
  • Supply Chain Management
  • White Papers
  • iSpeak Blog
  • Editorial Calendar
  • Article of the Year
  • Submit an Article
  • Editorial Team

Navigating the Asia Pacific Pharmaceutical Landscape for Global Impact

The Asia Pacific region (APAC), like any large territory, encompasses a blend of well-established and early-stage economies, diverse healthcare systems, and differences in language, culture, politics, and technology adoption. APAC’s size and complexity has created new challenges and opportunities for the pharmaceutical industry as nations work together to meet the manufacturing needs for medical products.

Featured Articles

Evolving China’s Regulatory System in Alignment with ICH

With the Chinese government initiating drug regulatory reform in 2015 and China joining the International Council for Harmonisation (ICH) in 2017, a significant number of measures have been implemented by the government. The aim is to make fundamental changes to China’s drug regulatory administration system so it can facilitate pharmaceutical development and better meet patient needs in the...

Digital Display Labeling in Clinical Supplies for Clinical Trials

Digital display labels (DDLs) offer an alternative solution to eliminate manual relabeling in the clinical supply chain, optimizing label content updates through a simple, system-controlled approach while providing new, uncharted opportunities. With increased efficiency in making regulatory-compliant changes and enhanced flexibility in the clinical supply chain, DDL technology has the...

Examples of variables that represent the worst-case recipe(s) within a vessel or functionally equivalent vessels

The validation of media and buffer mixing is a continuing area of resource constraint in the pharmaceutical industry. These validations require materials, validation associates’ time, and the use of equipment and processing areas. This article proposes a risk-based life cycle for minimizing mixing validation resource inputs, with the objective of optimizing validation efforts through the use...

Computer Software Assurance and the Critical Thinking Approach

In 2022, the US Food and Drug Administration (FDA) issued their draft guidance “Computer Software Assurance for Production and Quality System Software”

Article

Pharmaceutical manufacturing facilities produce a variety of products, including highly potent products that require safety measures to prevent adverse health effects on patients and operators. To ensure safety, these facilities use containment equipment to minimize the risk of contamination. This article presents criteria for selecting containment equipment, considering both...

Figure 1: A: Portable electrochemical minicell tool for onsite stainless steel surface inspection. B: EIS data acquisition through Bluetooth connection.

Pharmaceutical critical utilities are typically built of 316L stainless steel; nevertheless, surface degradation has been reported due to the occurrence of different phenomena. This article aims to explain how field electrochemical techniques using a portable tool can be an effective method for surface inspection, qualification, and monitoring. The surface finish assessment considered...

A Sustainable Approach to Steam Quality Management

The world is beginning to grasp the huge challenge of achieving net-zero carbon emissions, or carbon neutrality, by 2050. Many countries have committed to achieving this ambitious goal. As a major global industry, the pharmaceutical sector has a significant role to play. For thermal energy–intensive industries, such as pharmaceutical manufacturing, the long-term future options to maintain...

CoP Leader Profile: Nathan Temple, PE

Although he didn’t know it at the time, Nathan Temple’s service as a naval officer on the USS Asheville in Hawaii prepared him perfectly for a career in pharmaceutical engineering. “You learn so much, so quickly, on a submarine.”

CoP Leader Profile: Tammy Spain, PhD, PMP

The impact people have on others’ lives is not always obvious. For many, the Advil mini pill might not seem like a big deal, but for people who have trouble swallowing pills, like cancer patients, the tiny pill has made a huge difference in their ability to manage pain. It’s projects like that, as well as treatments for bladder cancer and transthyretin amyloidosis, that Tammy Spain is most...

Quality Considerations in Disaster Recovery: A Case Study

Due to the growing digitalization of the industry, we are highly dependent on information technology (IT) systems and data. The basic ability to execute our pharmaceutical business and decision-making processes relies on the permanent availability of these IT systems and data to ensure compliance and efficiency of our business operations. But numerous factors—including criminal activities,...

The Use of Infrastructure as Code in Regulated Companies

IT infrastructure has traditionally been provisioned using a combination of scripts and manual processes. This manual approach was slow and introduced the risk of human error, resulting in inconsistency between environments or even leaving the infrastructure in an unqualified state. In this article, we investigate some fundamental advantages of using Infrastructure as Code (IaC) for...

Leveraging  GAMP®  5 Second Edition for Medical Devices

This article provides a brief introduction into the standards and regulations for medical devices. It compares the ISPE GAMP® 5 Guide: A Risk-Based Approach to Compliant GxP Computerized Systems (Second Edition) and applicable ISPE GAMP Good Practice Guides against the relevant regulations and standards for the development of software for medical devices and demonstrates GAMP® 5 Second...

Delivering Curative Therapies: Autologous vs. Allogeneic Supply Chains

Advanced therapy medicinal products (ATMPs) are one of the most promising developments in the pharmaceutical and biotech industries in recent decades. Although there is a great promise to treat and even cure many diseases with these products, there are also unique challenges, especially with their supply chains.

Optimizing Cost of Goods for Cell Therapy Manufacturing

Facility design decisions made early in conceptual design can have a significant impact on the cost of goods sold (COGS) in the manufacture of autologous and allogeneic cell therapy products. Understanding the impact of a COGS analysis is an important aspect of the early-phase design process.

Live Biotherapeutic Products: Moving the Microbiome to the Patient

Live biotherapeutic products (LBPs) have the potential to treat a wide range of ailments. However, these living microorganisms are difficult to produce due to evolving government regulations and limited GMP manufacturing experience. New facility designs and more specific process guidance could help overcome these challenges. This article explores the nuances of facility design and regulatory...

Comparability Considerations for Cellular & Gene Therapy Products

Cell and gene therapy (C&GT) products comprise a rapidly growing field of innovative medicines that hold the promise to treat and, in some cases, cure diseases that are otherwise untreatable. In this article, we provide points to consider when evaluating the comparability of C&GT when changes are made in their manufacturing processes.

An Evaluation of Postapproval CMC Change Timelines

As the demand for accelerated access to medicines expands globally, the pharmaceutical industry is increasingly submitting regulatory applications in multiple countries simultaneously. As a result, Boards of Health (BoHs) are challenged with approving these applications in an accelerated timeframe and accommodating the submission of postapproval chemistry, manufacturing, and controls (CMC)...

Air Speed Qualification: At Working Position or Working Level?

The new European Commission GMP Annex 1 “Manufacture of Sterile Medicinal Products” and the equivalent Annex 2 from the World Health Organization (WHO) triggered a discussion in ISPE’s Germany/Austria/Switzerland (D/A/CH) Aseptic Processing Community of Practice (CoP) Steering Committee about where to qualify air speed: “at working position” versus “at working level.” This article provides...

Considerations for a Decentralized Manufacturing Paradigm

The biopharmaceutical industry must develop and implement innovative ways of working to be effective and efficient in the current healthcare ecosystem, in which high-quality medicines, adaptability, and assurance of supply are of critical importance. There are regulatory strategies and technologies emerging to address these challenges, but further progress must be made to fully harness the...

New EU AI Regulation and Gamp® 5

This article describes how ISPE GAMP ® 5: A Risk-Based Approach to Compliant GxP Computerized Systems (Second Edition) and related GAMP Good Practice Guides can be effectively applied to help meet the requirements of the proposed European Union (EU) artificial intelligence (AI) regulation for qualifying GxP-regulated systems employing AI and machine learning (ML).

Enabling Global Pharma Innovation: Delivering for Patients

ISPE has launched an important new initiative, “Enabling Global Pharma Innovation: Delivering for Patients,” in support of the aspirations of many regulatory agencies globally to promote introduction of innovative pharmaceutical manufacturing.

ICH Q13 and What Is Next for Continuous Manufacturing

The creation of a new ICH guidance document, Q13, 1

  • 1 International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use. “ICH Harmonised Tripartite Guideline Q13: Continuous Manufacturing of Drug Substances and Drug Products.” Published July 2021.

A Systemwide Approach to Managing the Risks of Continuous Manufacturing

Understanding and managing risks to continuous manufacturing (CM) technology is central to any decision to greenlight CM in a production-ready environment. Applying a systemwide risk management (SRM) approach to manufacturing is essential to ensuring manufacturing projects are vetted in a comprehensive and consistent manner.

Agile, Data-Driven Life Cycle Management for Continuous Manufacturing

Pharmaceutical continuous manufacturing (CM) is recognized as a key process intensification technology, with investment expected to rise in the coming years and the focus shifting toward biologics. This article provides a review on the current state of CM implementation, offers insights into life cycle management and regulatory aspects, and explains how a data- and knowledge-centric approach...

USP/BIOPHORUM Workshop on Continuous Manufacturing of Biologics

In the interest of understanding the current state of continuous manufacturing for biologics and to facilitate the path toward adoption of these promising technologies, the United States Pharmacopeia (USP) and BioPhorum jointly sponsored a hybrid workshop. This article summarizes trends from the workshop and ponders next steps.

Facilities & Equipment

Research & Development

Product Development

Latest Articles

2024 ISPE Aseptic Conference keynote Presentations

ISPE hosted more than 450 attendees in person and virtually for the 2024 ISPE Aseptic Conference in Vienna, Austria. Keynotes and education sessions provided a comprehensive overview of key topics and trends...

Industry 4.0 Machinery Production Facility

Biopharmaceutical facility design is a critical aspect of the industry. Overall cost pressures in the global health system, regional requirements to deploy manufacturing rather than centralize manufacturing in one location, technology enhancements in cell biology and format, flexibility to accommodate multi-product campaigns with different production schedules, and speed-to-market are among...

DNA strands-by Thomas

The biotechnology industry continues to evolve, advancing the frontiers of science and engineering in the design, development, and manufacture of a wide variety of therapeutic modalities. Advances in cell and gene therapies continue to offer treatments for diseases that were previously uncurable, with seven approvals in 2023, including treatments for sickle-cell disease, hemophilia, type 1...

EU23_ISPE_Executive Forum

The International Society of Pharmaceutical Engineering’s (ISPE) Women in Pharma ® group was established in 2017. Since its inception, it has grown to a community of more than 2,000 members who are actively engaged in educational, collaborative, and networking activities designed to bridge gender, cultural, organizational, and geographic boundaries, maximize the impact women have in...

2023 ISPE Annual Meeting & Expo Women in Pharma® event

Since starting with ISPE, Edyna Miguez, has had the incredible opportunity to connect with women across the globe working within the pharmaceutical industry, as part of the ISPE Women in Pharma ® program. The tenure of these rising female leaders varies, as do their cultural backgrounds. However, one fact remains consistent: they are looking to make the industry a more equitable and...

What you need to know about the GAMP track

The ISPE GAMP ® 5 Guide (Second Edition) was published on 29 July 2022. It was presented and discussed at the 2023 ISPE Europe Annual Conference, the 2023 ISPE Annual Meeting & Expo, and at several local...

EU23 Attendees

The importance of the 2024 ISPE Europe Annual Conference is evident with the continuous challenges the pharmaceutical industry is facing in terms of drug shortages, regulatory requirements, digitalization, automation, and big data management.

The Machine Learning Life Cycle (MLLC): Key Artifacts, Considerations, and Questions

It is an exciting time in life sciences. Many companies have focused initial artificial intelligence/machine learning (AI/ML) adoption efforts on developing and implementing ML trained algorithms, which necessitate following a controlled approach such as the GAMP® 5 ML Sub-System methodology shown below. This blog highlights key artifacts and considerations when implementing AI/ML for GxP use...

Practical Applications of Current Standards and Regulations in the Context of Digitalization

At the 2023 ISPE Pharma 4.0™ and Annex 1 Conference , held 11 – 12 December 2023 in Barcelona, Spain, a panel discussion that included regulators from around the world, as well as industry experts,...

Concluding Compliance Challenges with Validation 4.0

As the pharma industry moves to an ambitious Validation 4.0 paradigm, computerized systems play a pivotal role in enabling the rapid transition. Innovation and agility in computerized system validation (CSV) received a strong push in the second half of 2022 with the publication of the FDA draft guidance on “Computer Software Assurance for Production and Quality System Software”

Carla J. Lundi, US Food and Drug Administration (FDA) Senior Consumer Safety Officer, Division of Quality Intelligence II, FDA

Carla J. Lundi, US Food and Drug Administration (FDA) Senior Consumer Safety Officer, Division of Quality Intelligence II, FDA, presented the regulatory keynote at the 2024 ISPE Facilities of the Future...

ai-technology-microchip-Image by rawpixel.com on Freepik

Artificial Intelligence (AI) can be described as the application of computer science, statistics, and engineering, utilizing digital algorithms or models to analyze information, perform tasks and exhibit behaviors such as learning, making decisions, and making predictions.

Evdokia Korakianiti

One of the keynote sessions of the 2023 ISPE Pharma 4.0 TM and Annex 1 Conference held 11 - 12 December 2023 in Barcelona, Spain was a presentation on European Union (EU) initiatives to...

ISPE Board Member Jim Breen, and featuring industry leaders Melody Spradlin, Katrina Mosely, Patricia Martin, Carla Lundi, Deborah Donovan, and Muriel Campbell

The energy in the room was palpable as attendees eagerly awaited the start of what promised to be an enlightening and thought-provoking evening.

Asia Pacific Region Map - Image by Freepik

ISPE’s Regulatory Quality Harmonization Committee (RQHC) is structured with four Regional Focus Groups (RFGs): Asia-Pacific, Europe/Middle East/Africa, Latin America, and North America. The RFGs support ISPE in its understanding and response to regulatory issues and concerns for the specific region. The RQHC Asia-Pacific RFG includes pharmaceutical professionals located in or having specific...

How Men in the Industry Inspire Inclusion and Position Themselves as Allies

Aaron Bober is an active member of ISPE, the ISPE Women in Pharma® Boston Chapter, and the ISPE Women in Pharma International Steering Committee. He is based in the Boston, Massachusetts, area and serves as the Director of Engineering, New England, at IPS. Bober developed the below testimonial, recounting his personal experience and investment in supporting women in the pharmaceutical industry...

Celebrate International Women’s Day with ISPE’s Women in Pharma

With well over a century of history and change, the first International Women's Day was held in March 1911. International Women's Day is a day of collective global activism and celebration that belongs to all those committed to forging women's equality.

International Women’s Day Is 8 March

The new year is here and, with that, another International Women’s Day approaches! This incredibly important global movement, which takes place every 8 March, celebrates women’s achievements, raises awareness about discrimination, and advocates for accelerated equality and gender parity. ISPE’s Women in Pharma® community strives to accomplish all these goals through regional and international...

Ireland Affiliate Creates Opportunities for Members

Last year at the 2023 ISPE Annual Meeting & Expo , the

2023 ISPE Pharma 4.0™ and Annex 1: Keynote Presentations

The 2023 ISPE Pharma 4.0™ and Annex 1 Conference was held in Barcelona, Spain, 11–12 December. Topics discussed included transforming operations, quality, and maintenance with Pharma 4.0™ principles...

ISPE Affiliates and Chapters Span the Globe

ISPE has more than 21,000 members in more than 120 countries worldwide. As an ISPE member, you have access to this network, which can be exciting and overwhelming at the same time. Connecting at the local level unlocks the unique benefits that your local Affiliate or Chapter holds. Not only is this a career game changer, but it also opens doors of opportunity to experience your ISPE membership...

Guidance Documents

The newly released third edition of the ISPE Baseline® Guide: Volume 6 – Biopharmaceutical Manufacturing Facilities reinforces the concepts described in the second...

Emerging Leaders Editorial - Monique L. Sprueill

Personalized medicine provides a treatment alternative that utilizes patients’ genetic material to produce therapeutics. According to Market Research Future, the US currently accounts for the largest share of the personalized medicine market, and it is expected to reach US $27.5 million by 2030.

scientist-analyzing-liquid-using-protective-equipment-indoors-Image by vecstock on Freepik

In the pharmaceutical industry, which is highly regulated, aseptic processing is a critical component that ensures the sterility of products. Regulators have a set of comprehensive requirements that minimize the risk of contamination. Regulators set the requirements; however, the industry has an obligation to the patients who rely on and expect a drug that is safe and free of contamination....

Women in Pharma® Editorial - Fatima Jacoba Mancilla Islas

Over time and with effort and determination, women in key leadership positions have proven that these positions are genderless for individuals with the correct set of abilities and knowledge.

Mettler-APW-eBook_Hero-Image

Unleash a new level of machine innovation with modular and reliable Automated Precision Weighing (APW) family that serve mission-critical sectors where accuracy, repeatability and speed are top priorities. Bene昀椀t from the direct connectivity to nearly any automation system via PROFINET or EtherNet/IP networks.

WMFTS hero image

Transitioning from lab manufacture to commercialization presents significant challenges, no matter what area you’re working in. Cell and gene therapies are no different and production quantities must scale at multiple stages, first to support clinical trials and then when they reach the market. With uniquely challenging conditions required to protect medicinal integrity and exceptionally high...

Contec-banner

Environmental monitoring (EM) in pharmaceutical manufacturing is a critical component of ensuring product safety and efficacy. This article aims to unravel the considerations for optimal sample collection and data evaluation within pharmaceutical environmental monitoring programs, presenting this technical domain in an accessible and insightful manner.

Current Issue

Pharmaceutical Engineering

March / April 2024

Not a member yet.

Whether networking at events or collaborating through our Communities of Practice, the value of an ISPE membership is in the connections made between pharmaceutical industry professionals and Regulators to collaborate on solutions to common goals and challenges.

Join Today Member Benefits

ISPE Members Collaborating at Event

Call for Articles

Are you a subject-matter expert in the global pharmaceutical industry? Are you brimming with knowledge about the latest technical developments or regulatory initiatives? Have you found an innovative solution to a real-world challenge?

Contribute your knowledge and experience to the always-evolving pharmaceutical industry!

Article Guidelines Editorial Calendar

Do you know someone whose profile or work should be featured in Pharmaceutical Engineering ? Contact  ISPE Publications Department .

TF Resource

Model Validation and Reasonableness Checking/Assignment

Assignment Procedures

Traffic Assignment Checks

Transit Assignment Checks

Page categories

Model Calibration And Validation

Needs Review

# Assignment Procedures

Assignment is often viewed as the culmination of any modeling process, be it a traditional four-step modeling process or an activity-based modeling process. Many models now include feedback loops to "equilibrate" assigned travel speeds with travel speeds used for prior modeling steps such as trip distribution , destination choice , and mode choice. Nevertheless, the modeling process typically ends with the assignment step.

The assignment step includes both highway and transit assignments of vehicle and person trips respectively. While there are emerging assignment procedures such as dynamic traffic assignment (DTA) and regional simulation procedures, research into the integration of these emerging procedures and travel demand models is just now occurring.

Assignment validation is generally inseparable from the rest of the modeling process. This is especially true for traffic assignment since it is not feasible to collect sufficient survey data to construct an observed trip table for traffic assignment. For transit assignment, observed transit trip tables might be constructed from comprehensive on-board surveys such as those performed for the FTA New Starts analyses. 31

Assignment validation is an important step in validating not only the assignment process but the entire modeling process. Assignment validation typically benefits from a wealth of independent validation data including traffic counts and transit boardings collected independently of household or other survey data used for model estimation and, increasingly, from independent traffic speed and travel time studies. In addition, due to established traffic and transit counting programs in many regions, traffic and transit count data can be used for temporal validation of travel models (see Temporal Validation and Sensitivity ).

Unfortunately, as the culmination of the modeling process and due to the wealth of independent validation data, the assignment of trips to the network often becomes the primary basis for validating the a travel model's ability to replicate observed travel. In effect, assignment validation becomes a "super" data point defining a successful validation for many modelers and planners. While it is important that assignment validation be reasonable, highly accurate traffic and transit assignments in terms of matching observed traffic and transit volumes are not sufficient for proving the validity of travel models. In some cases, the over-emphasis on matching observed traffic volumes and transit boardings has led to poor model adjustments such as link specific changes to the network speeds and capacities and "fine-tuning" of connector links for better match between modeled and observed traffic volumes or transit boardings.

Since assignment techniques are not wedded to a specific modeling process, this chapter will be structure slightly differently from the other chapters in this manual. Specifically, it will focus first on traffic assignment validation and then on transit assignment validation.

# Traffic Assignment Checks

Both traditional and emerging traffic assignment procedures may be used for assignment. Traditional techniques may be characterized as procedures that represent trips on each interchange as being omnipresent on all links reasonably serving the interchange. Traditional techniques include static equilibrium assignment, other capacity-restrained assignment, stochastic multipath assignment, and all-or-nothing assignment. Static equilibrium assignment is probably the most frequently used traditional traffic assignment technique.

Capacity-restrained traffic assignment techniques rely on volume-delay functions to estimate increases in individual link traversal times as assigned traffic volumes approach the estimated traffic carrying capacity for the link. The Bureau of Public Roads (BPR) curve has often been used to estimate link travel times resulting from the assigned volumes. In recent years, a number of enhancements have been made to the process, due in part to increases in computing power. Volume-delay functions have been developed for different facility types (freeway versus arterial for example) and in some regions, intersection-based techniques mimicking Highway Capacity Manual (HCM) intersection delay estimation techniques have been implemented. The detail of the coding of networks has also increased dramatically, along with an associated reduction in the size of the traffic analysis zones. Most traffic assignment programs also provide an option for class-based traffic assignment so that single-occupant vehicles (SOV), high-occupant vehicles (HOV), and trucks can be assigned simultaneously, interacting on general purpose links, but also being able to travel on links restricted by vehicle class (e.g., HOV lanes).

Emerging traffic assignment techniques include DTA and regional traffic simulation. A key to the emerging techniques is that they explicitly account for the actual time to travel between an origin and destination for an interchange. In addition, the emerging techniques can account for traffic queues backing up to impact other links in the network. The emerging traffic assignment techniques may be more suitable for use with activity-based modeling techniques although some have been applied using the results of traditional four-step models.

The focus of this section is the validation of traffic assignments. Many of the validation techniques relate to link-based traffic volumes and travel times. The validation tests can be applied regardless of whether the assignment results were produced by a traditional assignment technique or an emerging technique.

# Sources of Data (Highway)

# traffic counts.

Traffic count data are the primary data used for the validation of traffic assignment procedures. Most traffic count data are obtained from various traffic count programs used for monitoring of traffic or collected for the Highway Performance Monitoring System (HPMS).

Traffic count data are an important independent validation data set. Nevertheless, traffic count data are often afforded more credence than they deserve. Counts are often collected from multiple sources such as state Departments of Transportation, toll authorities, counties, cities, and private contractors with each using various counting techniques. For example, counts from permanent traffic recorders, 48-hour or 24-hour counts performed using tube counters, and ancillary counts such as manual intersection counts may all be stored in the same database. Counts may be stored as raw counts or factored counts, such as average annual daily traffic (AADT). In addition, counts from multiple years surrounding a base year for model validation may be included for a validation in order to maximize the count data available.

In light of the above, the development of a validation database is a significant undertaking. In establishing the database, the data forecast by the regional travel model should be considered. Most regions develop travel models to provide forecasts of travel for an average weekday. Thus, the traffic count validation data should also reflect average weekday traffic (AWDT) for consistency. In addition to ensuring consistency of counts, the development of the traffic count database should also include consideration of geographic coverage, adequate representation of different functional classes, and completeness of screenlines. Inclusion of classification count data should be considered, especially if the travel model produces (and the region is concerned with) forecasts of high occupancy vehicles or truck volumes.

The variation of the count data should also be a concern in the development of the traffic count validation database. A traffic count for a facility is, in effect, a single sample of the set of daily traffic counts that occur on the link over a period of time. Thus, a single traffic count or a set of traffic counts for a single facility represent a sample for the link subject to sampling error. In 1981, the U.S. Department of Transportation published the Guide to Urban Traffic Counting, which included a figure depicting the expected coefficient of variation in daily counts. In 1997, a study of the variability of traffic count data included information from 21 permanent traffic recording (PTR) stations in Florida. 32 The curve depicting the original estimation of coefficient in variation of traffic counts and the observed data from Florida are shown in Figure 9.1 .

# HPMS Data

Regional vehicle-miles of travel (VMT) are estimated from traffic counts for the HPMS. The regional VMT estimates can provide a target for modeled VMT. However, prior to using the observed regional VMT based on the HPMS data, the consistency of the HPMS data and the modeled data should be verified. The consistency checks should include:

  • The HPMS area covered versus area covered by the travel model;
  • The facilities included in HPMS VMT (e.g., local street VMT) versus facilities included in model; and
  • Whether VMT estimates are based on average annual daily traffic or average annual weekday traffic.

# Travel Time and Speed Studies

Many regions have initiated the collection of travel times and speeds. These studies typically identify a number of corridors in the region each served by a single functional class such as freeway, expressway, principal arterial, or minor arterial. A number of travel time runs are then made through the specified corridors at various times of day to collect travel time, and thus, average travel speed information. The data collected can vary from simple end-to-end travel times to the components of the end-to-end travel times including run times, cruise times and signal delay times, delay times due to incidents, and in some studies, coincident traffic counts on the facilities traversed.

If traffic count data are collected along with the detailed travel time data, it may be possible to use the data to validate (or even to estimate) the volume-delay functions used in the traffic assignment process. Some regions have used detailed travel time and traffic count data to develop volume delay functions that result in validated traffic counts and traffic speeds being produced directly by the assignment process. Other regions use one set of volume-delay functions to produce validated traffic counts and a second set in an assignment post-processing step to estimate traffic speeds for air quality modeling.

As with traffic count data, travel time and speed studies may be subject to substantial variation depending on the day or days the data are collected. Nevertheless, the data collected can be quite useful in validating congested speeds produced by the travel model. With the strong connection between travel models and air quality models, the validation of congested speeds produced by the traffic assignment procedure is an important consideration.

Some regions also collect spot speed study data. These data may be useful for validation of modeled speeds for facilities uninterrupted by intersections such as freeways and expressways. Spot speed data are of limited use for arterials and other facilities with traffic control devices at intersections since delays resulting from the traffic control devices are not considered in the speed studies.

# Aggregate Checks (Highway)

A good approach to the validation of the traffic assignment procedure is to start with the most general aggregate checks and progress toward more detail. Aggregate checks should be generally applicable for both traditional traffic assignment procedures and for emerging techniques.

As mentioned previously, assignment is the culmination of the modeling process and, in effect, validates the entire modeling process. The aggregate VMT and Volume-to-Count Ratio checks provide this overall modeling process check more than subsequent tests that will be described later in this chapter. Different information regarding the modeling process can be inferred from each level of the summaries:

  • Regional summaries provide an indication of the reasonableness of the overall level of travel. The results help confirm that the trip generation, trip distribution, and mode choice models, or their activity-based modeling corollaries, as well as the assignment process, are performing reasonably.
  • Summaries by facility type provide an overall indication of the operation of the assignment procedures. These results of these summaries might indicate issues with free-flow speeds, link capacities, or volume-delay functions.
  • Summaries by geographic area may be useful for uncovering geographic biases in the modeling process. These biases might relate to previous steps in the modeling process. GIS plots of errors or percent errors by geographic area may facilitate this analysis.
  • Summaries by combinations of the above strata may provide additional diagnostic information if one of the above summaries indicates a validation problem.

# Vehicle-Miles Of Travel

As noted in HPMS Data , the base year VMT produced by the model can be compared to observed VMT estimates from HPMS. For comparisons with HPMS VMT estimates, modeled traffic for all network links should be considered.

The VMT checks should be made for the region and by market segment. Markets may include facility type, area type, or geographic subdivision (e.g., county or super-district). It is important when comparing VMT estimates to ensure that the lane miles covered by the model are consistent with lane miles from HPMS, and the total lane miles for each region and market segment should be reported along with the VMT statistics. Table 9.1 (a) provides an example of VMT summaries by facility type.

Table 9.1 (a) Example VMT Validation Summary by Facility Type

| | | Estimated ( a )

| Freeways | 112 | 23,342,838 | 24,078,537 | -735,699 | | Expressways | 33 | 3,477,618 | 3,306,422 | 171,196 | | Principal Arterials | 264 | 19,508,011 | 18,578,391 | 929,620 | | Minor Arterials | 351 | 7,125,530 | 7,257,875 | -132,345 | | Collectors | 399 | 8,911,433 | 9,178,980 | -267,547 | | Total | 1159 | 62,365,430 | 62,400,204 | -34,774 |

# Volume-to-Count Ratio

(opens new window) , for details). Many agencies feel that because of the intrinsic "fuzziness" in HPMS data, the performance of the model is better evaluated by limiting aggregate comparisons to locations for which counts have been collected in the model base year (or over a period that better reflects the detailed conditions the base year model is designed to reproduce).

While it may seem intuitive to perform a VMT comparison on just the links with quality counts (taking the observed and modeled counts, multiplying each by the link length on which the count was collected, and adding up the resultant "VMT"), the resulting statistics prove to be a poor basis for evaluating model performance. The difficulty lies in the fact that there is no independent standard for deciding what the "correct" link length should be (or, in other words, how many lane miles the count should be applied to). Note that when validating against VMT from HPMS, there is an independent standard - specifically the inventory of lane miles maintained in HPMS itself.

Practically, the difficulty with "partial VMT" is that if the same modeled and observed counts are evaluated with different sets of link lengths, the resulting statistics can be very different (even though the model is performing equivalently in both cases). To illustrate with a concrete example, consider a case where several river bridge crossings are each coded as a single long link, versus the same model with the bridges each divided into two shorter links (as might happen if a jurisdiction boundary follows the river and the model is required to produced VMT estimates by jurisdiction). If the model is doing a relatively poor job on cross-river trip distribution, the error will be magnified (relative to the other links that are aggregated into the result) if the bridge is coded as one long link, and minimized if the bridge is coded as two links but the count only placed on one link. Note that while one might attempt to get equivalent results by placing the same count on both links, doing so will wreak havoc with the link based statistics such as %RMSE described in Traffic Volume-Related Checks .

Because the link length coding is essentially arbitrary if using a subset of count locations, the statistical difficulty of variable link lengths can be neatly circumvented by applying a standard length to each count location (so that each count is equally "weighted" in relation to the other counts). The numerical consequences of the arbitrary lengths can be entirely removed if one uses the ratio of modeled and observed VMT to compare modeled and observed results (so the actual units of length cancel out). Because the standard length is arbitrary (but the same for all count locations) and does not appear in the final statistic, it is standard practice to consider it a unit value ("1 count length"). As a result, the ratio of modeled and observed "VCLT" (Vehicle Count Lengths Traveled) reduces numerically to its simpler mathematical equivalent: the ratio of (Modeled) Volumes to (Observed) Counts. Likewise, whereas in a VMT comparison one would report the number of lane miles evaluated, in Volume-to-Count comparisons one should report the sum of "count lengths" which (because we are using a "unit count length" at each location) is numerically equivalent to the number of count locations.

Checks of Volume-to-Count ratio should be made for the region and by market segment. Markets may include facility type, area type, geographic subdivision (e.g., county or super-district), or (if the data are available) time-of-day (e.g., morning peak period, afternoon peak period, mid-day and night). Volume-to-Count comparisons may also be useful for evaluating Screenlines, Cutlines, and Cordon Counts .

Table 9.1 (b) provides an example of Volume-to-Count summaries by facility type.\

Table 9.1 (b) Example Volume-to-Count Validation Summary by Facility Type

| | Modeled ( a )

| Freeways | 2,623,122 | 2,705,795 | -1,687 | | Expressways | 379,816 | 361,118 | 2,337 | | Principal Arterials | 1,724,786 | 1,642,594 | 2,005 | | Minor Arterials | 515,707 | 525,286 | -639 | | Collectors | 368,211 | 379,266 | -582 | | Total | 5,611,642 | 5,614,059 | -18 |

# Traffic Volume-Related Checks

Traffic volume related checks compare modeled to observed traffic volumes on a link-by-link basis. Consequently, the amount of difference between the modeled and observed traffic for each link contributes directly to the overall measure of closeness even when the results are aggregated in different ways. This is in contrast to the VMT checks described above where a positive difference on one link can cancel a negative difference on another link.

The traffic volume related checks described in this chapter focus on traditional measures that are scalable and easily explained: root mean squared error (RMSE), percent RMSE (%RMSE), correlation (R), and coefficient of determination (R2). There are other measures similar to the measures covered in this section, such as mean absolute error (MAE), that may be used or preferred by some. The key to the measures is that they are scalable. For example, an RMSE of 1000 is one-half as large as an RMSE of 2000 for a given set of links.

"Pass-fail" validation tests are not recommended or discussed in this section since they imply an unwarranted level of confidence in the results (and in the observed data) and do not provide useful information regarding the goodness of fit of the model. These measures can be characterized as "the results are ‘valid' if the value obtained for the validation test is less than five."

# Root Mean Squared Error and Percent Root Mean Squared Error

RMSE and %RMSE for a set of links can be calculated using the following formulae:

assignment process & validations

  • Count i = The observed traffic count for link i ;
  • Model i = The modeled traffic volume for link i ; and
  • N = The number of links ( 33 ) in the group of links including link i .

RMSE and %RMSE are both measures of accuracy of the traffic assignment measuring the average error between the observed and modeled traffic volumes on links with traffic counts. As such, RMSE and %RMSE should be summarized by facility type (or functional class) or by link volume group. Summarizing the measures by geography can provide good validation information, especially if the measures continue to be stratified by facility type or volume group. While the measures can be calculated for more aggregate groups or the region as a whole, the measure becomes less useful for determining the quality of the assignment process. In effect, at too gross a level of aggregation, the RMSE or %RMSE measures can easily be interpreted as pass-fail measures: "The regional %RMSE is 32 percent so, obviously, the model is…" Such statements have little validity or usefulness for model validation.

If the traffic assignment process used for a region uses a look-up table to estimate link capacity (e.g., stratified by area type and facility type), it is useful to summarize RMSE by the same strata. In this way, the average error on links can be compared to the estimated capacities of the links to determine if the average error is, say, more or less than one-half lane of capacity. If the RMSE is based on more than a one-hour assignment, as would typically be the case, the RMSE can be adjusted to reflect a one-hour period through the use of a peak hour factor. For example, suppose the RMSE for freeways in a suburban area type was 10,000 based on daily traffic counts and the modeled daily traffic volumes. If eight percent of the daily traffic occurred in the peak hour, the average error represented in the peak hour could be estimated as 0.08 x 10,000, or 800 vehicles. If the modeled capacity for freeway links in the suburban area was 2,200 vehicles per hour per lane, the implied average error would be equivalent to a little over one-third of a lane.

# Correlation Coefficient or Coefficient of Determination

Pearson's product-moment correlation coefficient (R) is a standard statistical measure available in spreadsheet programs and other readily available statistical software packages. R is a dimensionless index that ranges from -1.0 to 1.0 inclusive that reflects the extent of a linear relationship between two data sets. It is calculated as follows:

assignment process & validations

Where Count i , Model i , and N are as defined for the calculation of RMSE.

The coefficient of determination, R2, which is simply the square of R, is typically interpreted as the proportion of the variance in a dependent variable, y, attributable to the variance in an independent variable, x. This traditional interpretation does not hold for traffic assignment validation since the modeled traffic assignment is not dependent on the traffic count, or vice-versa.

These two measures have been frequently used in the past in validation. They measure the strength of the (linear) relationship between the assigned volumes and traffic counts. In effect, R2 has been assumed to be a measure of the amount of variation in traffic counts "explained" by the model. The measures must be used with caution. An R2 for all links in the region simply says that links with high capacities (e.g., freeways) can, and usually do, carry more traffic than links with low capacities (e.g., local streets). As such, R2 probably tells more about the coding of facility type and number of lanes than about how the model and assignment is performing. Thus, achieving a regional R2 of 0.88, as has been suggested as a "standard" for determining a model's validity, has little if any meaning.

If used carefully, R2 can be a useful measure for comparing model results to other iterations when calibrating travel models and traffic assignments since the bases (i.e., the sets of links considered) for calculating the measure should be the same between iterations. The R2 statistics should be calculated for links with similar characteristics such as facility type or volume group. As an example, if the R2 statistics for each facility type were consistently higher for Iteration "X" of a travel model calibration as compared to the results for other iterations, the model used for Iteration X might be considered to be the best. Of course, all modifications made to the model for Iteration X should be considered prior to ranking the final results of the various iterations.

# Scatterplots

Scatterplots of modeled traffic volumes versus the observed traffic volumes are useful validation tools and should be combined with the R2 summaries. Figure 9.3 shows two scatterplots with identical R2 values. Even though the R2 values are identical, the scatterplots tell very different stories regarding the modeled volumes. In Figure 9.3(a) , the modeled volumes are randomly distributed around the observed traffic counts within a constant band. Such results might suggest that the volume-delay functions are having relatively little effect in the traffic assignment. In Figure 9.3(b) , the scatterplots suggest that the amount of error in the modeled volumes is proportional to the traffic count or, in effect, to the capacity of the link.

Analysis of outliers can be a good method for finding and correcting network or assignment errors. Some outliers, links with high observed volumes and very low assigned volumes or vice-versa, can be identified from the scatterplots. An alternative method for identifying outliers is to simply list or plot the links with the largest differences between modeled and observed traffic volumes. It is also worthwhile to identify and investigate links with zero assigned volumes.

# Screenlines, Cutlines, and Cordon Counts

Comparison of modeled volumes to observed counts for critical links (see the detailed discussion in Volume-to-Count Ratio ), especially along screenlines, cutlines, and cordon lines, is useful for assessing model quality:

  • Screenlines extend completely across the modeled area from boundary cordon to boundary cordon. Screenlines are often associated with physical barriers such as rivers or railroads, although jurisdictional boundaries such as county lines that extend through the study area may also be used as screenlines. Figure 9.4 shows example screenlines for a region.
  • Cutlines extend across a corridor containing multiple facilities. They should be used to intercept travel along only one axis. Figure 9.5 shows example cutlines for multiple corridors a region. Cutlines 3, 6, 7, and 8 might be also considered screenlines if the entire modeling area is shown in Figure 9.5 .
  • Cordon lines completely encompass a designated area. For example, a cordon around the central business district is useful in validating the "ins and outs" of the CBD related traffic demand. Over or under estimates of trips bound for the CBD could indicate errors in the socioeconomic data (employment data for the CBD) or errors in the trip distribution or mode choice model.

# Detailed Difference Plots

Detailed plots of absolute or relative differences between modeled traffic volumes and observed traffic counts can provide useful diagnostic information for model validation. Figure 9.6 shows an example of such a difference plot. Detailed difference plots are more appropriate for validation of models for corridor studies or diagnosis of problems. Typically, there is too much information at a regional level, although the data may be filtered to show only differences greater than a specified threshold value.

# Speed Checks

Speed checks compare modeled speeds to observed data from travel time studies or, possibly, spot speed data for facilities not affected by intersection controls. The modeled speeds may be output directly from the traffic assignment process or they may be output from an assignment post-processor. The speed checks are focused on time-of-day or peak hour assignment results. While they can be easily calculated from VMT and vehicle-hours of travel (VHT) summaries for links, 24-hour average speeds are not very meaningful.

It is somewhat more difficult to define validation tests focused on speeds than it is to define traffic volume related validation checks. While modeled speeds can easily be calculated for each link, the modeled speeds are directly impacted by the quality of the assignment results. Thus, errors in assigned speeds might result from errors in the estimation of speeds or from errors in assigned traffic volumes. This issue might be addressed by filtering the links included in the test to include only those links where the assigned traffic volume is within, say, ±20 percent of the observed traffic count.

An initial validation check of modeled speeds can be prepared by producing scatterplots of modeled versus observed speeds. The scatterplots might look like the examples shown in Figure 9.3 with "Observed Speed" and "Modeled Speed" replacing "Traffic Count" and "Modeled Volume." The scatterplots should be produced by facility type and, if possible, by link volume group within the facility type grouping. The stratification by volume group would address two primary issues:

  • It is probably more desirable to match traffic speeds on high volume links than on low volume links; and
  • Speeds on low volume links should be close to free-flow speeds; if the free-flow speeds do not match reasonably, the veracity of the volume delay functions or the free-flow speed inputs can be questioned especially if the speeds for high volume links match closely.

# Speed Versus Volume/Capacity Ratio Comparison Plots

Both observed and modeled speeds can be plotted against volume/capacity ratios. The observed speeds should be plotted against the volume/capacity ratio for the observed traffic count at the time the speed information was collected. The modeled speeds should be plotted against the modeled volume/capacity ratio. The plots should be produced by facility type. Figure 9.7 shows an example of such a plot.

The comparison plot shown in Figure 9.7 is a method for verifying volume delay functions for the assignment. It is just as valid to plot the modeled speeds using the specified volume-delay function for a specified facility type. The comparison plots remove the impacts of differences in modeled traffic volumes and observed traffic counts inherent in the scatterplots of modeled versus observed speeds. The plot shown if Figure 9.7 suggests that the modeled speeds do not decrease quite quickly enough as the volume/capacity ratio increases.

With the increased use of global positioning system (GPS) units for household travel surveys, there will be an increase in "speed run" data for model validation. Since one of the assumptions underlying static-based equilibrium traffic assignment is that no traveler can reduce his or her travel time by switch travel paths, it should be increasingly possible to compare travel times on interchanges for selected times of day to modeled travel times for the same interchanges for comparable time periods. These comparisons will provide general information regarding the reasonableness of modeled travel speeds.

# Disaggregate Checks (Highway)

Disaggregate validation checks are focused on emerging traffic assignment techniques such as DTA and traffic simulation. As such, validation methods are also emerging and may require data that are not readily available. The following outlines two possible tests for the emerging techniques.

# Route Choice

If household or other travel survey data have been collected using GPS units, it might be possible to compare modeled to observed paths for selected trips. A measure of accuracy such as the percent of modeled links used matching observed links used might be useful.

# Traffic Flow

Aggregate tests such as link speed comparisons and traffic volume comparisons described above are useful for validation of the emerging techniques. However, additional tests might be appropriate, especially if GPS data are available. Specifically, for specific trips, it might be possible to compare the components of travel time for a selected route (e.g., stop delay time and travel time in motion). Alternatively, if traffic engineering data are available, modeled level of service measures (e.g., intersection delay) might be compared to observed data.

# Criteria Guidelines (Highway)

# aggregate validation checks.

In the Peer Exchange on Travel Model Validation Practices held in Washington, D.C. on May 9, 2008, a general consensus of participants was:

There was some agreement that setting validation standards for matching traffic counts, transit boardings, and screenline crossings can be a double-edged sword. While standards can be used to help determine relative model accuracy, they also can encourage over-manipulation to meet the standards. This can be especially true if project rankings or construction funds are based on absolute values rather than relative results. While almost any travel model can be manipulated to attain a specified validation standard, it is important to emphasize the use of appropriate methods to meet the standard. Methods used to achieve a reasonable match between modeled and observed traffic volumes can be as important as the reasonableness of the match itself. Therefore, model validation should focus on the acceptability of modeling practices in addition to attaining specified standards. A model validation that matches specified trip assignment standards within a reasonable range using valid modeling procedures is better that a model that matches observed volumes with a tighter tolerance using questionable modeling procedures. ( 34 )

Based on the above, this chapter reports some guidelines that have been used by various states and agencies. Specifically, Table 9.2 lists some example guidelines used for the match between modeled and observed VMT for Ohio and Florida. Figure 9.8 summarizes %RMSE guidelines used in Ohio, Florida, and Oregon. The Michigan Department of Transportation (MDOT) has targets of 5 percent and 10 percent for screenlines and cutlines, respectively, for percent differences in observed and estimated volumes by screenline. Figure 9.9 shows the maximum desirable deviation in total screenline volumes according to the observed screenline volume originally cited in Calibration and Adjustment of System Planning Models , produced by the FHWA in December 1990, and referenced in a number of documents, including the NCHRP Report 255, and the 1997 version of this manual. The guidelines in this section should not be construed as standards; matching or exceeding the guidelines is not sufficient to determine the validity of a model.

Table 9.2 Example VMT Guidelines by Functional Class and Area Type

| | '''Ohio ( a )

| Functional Class

| Freeways/Expressways

| Principal Arterials

| Minor Arterials

| Collectors

| All Links

| Area Type

# Disaggregate Validation Checks

There are no specific criteria guidelines associated with disaggregate traffic assignment checks described above.

# Reasonableness and Sensitivity Testing (Highway)

Reasonable ranges of VMT per household are 40 to 60 miles per day for large urban areas and 30 to 40 miles per day for small urban areas. The 1990 NPTS reported an average of 41.37 vehicle miles traveled per household daily. The average increased to 58.05 vehicle miles of travel in the 2001 NHTS (although differences in the survey methods account for some of the increase). Reasonable ranges of VMT per person are 17 to 24 miles per day for large urban areas and 10 to 16 miles per day for small urban areas.

Traffic assignment techniques vary from region to region. Based on a review of the model documentation of assignment procedures used by 40 different MPOs throughout the country:

  • About 70 percent use time-of-day traffic assignment procedures;
  • 75 to 80 percent perform class-based assignment techniques; and
  • 20 to 30 percent perform speed equilibration for some of the assigned time periods.

Table 9.3 summarizes the ranges of coefficients and exponents of BPR-like volume delay functions as reported by 18 of the MPOs. The BPR-like function estimates the congested travel time on a link using the following formula:

assignment process & validations

  • Time final is the final, congested travel time on a link;
  • Time initial is the initial, or free-flow, travel time on a link;
  • V is the assigned volume on a link;
  • C is the capacity of the link (at level of service E); and
  • α and β are model coefficients

Table 9.3 Range of Reported BPR-Like Assignment Parameters (18 MPOs)

| | Minimum

| Freeways | 0.10 | 1.20 | | Arterials | 0.15 | 1.00 |

Sensitivity testing of traffic assignment procedures can be performed by making changes to the networks or input trip tables used for assignment. Several approaches are as follows:

  • Regional sensitivity - Check reasonableness in change in VMT to changes in total trips. Increase (factor) trips by a factor (e.g., 1.5) and check to see that total VMT changes by a similar factor. If there is little congestion in the region, VMT should increase by a similar factor. If there is substantial congestion, VMT should increase by more than the factor.
  • Localized sensitivity - Modify key network elements and review assignment results for changes and reaction to network elements (using a fixed trip table). For example, remove a key bridge or limited access facility and review the impact on traffic using volume difference plots between the original and modified alternatives.
  • Over-sensitivity - For congested networks, make a minor change to a network (e.g., add a lane of traffic to a minor arterial link) and reassign a fixed trip table using same number of iterations and closure criteria. Review the impact on traffic using volume difference plots between the original and modified alternatives. Traffic impacts should be very localized.

# Troubleshooting Strategies (Highway)

Since traffic assignment is the culmination of the modeling process, issues can easily be related to previous steps in the modeling process. It is, however, always valid to start the troubleshooting with the traffic assignment step and work backwards through the modeling process. Table 9.4 provides some troubleshooting strategies for common issues that might occur with a traffic assignment.

Table 9.4 Troubleshooting Strategies for Issues with Traffic Assignment

| 1. Low, high, or unrealistic base year modeled link volumes compared to traffic counts | - Check network coding (speeds, capacities, etc.) on these links, nearby/adjacent links, and links on competing paths - Check TAZ connections and loading at centroids - Check traffic count data | | 2. Uneven facility loading on parallel competing routes | - Review centroid connections - Review facility and area type coding and input starting speeds for assignments - Review zone structure and number of zones – may need to have finer spatial resolution - Review final congested speeds and volume-delay functions | | 3. Travel times not representative of observed data | - Review facility and area type coding and input starting speeds for assignments - Review final congested speeds and volume-delay functions | | 4. Links with zero assigned volume | - Check network coding (including nearby or competing links) for continuity, stub links, centroid connector locations, and attributes such as free-flow speeds and capacities | | 5. Links with very high assigned volume/‌capacity ratios | - Check network coding (including nearby or competing links) for centroid connector locations and attributes such as free-flow speeds and capacities |

==== Forecast Checks (Highway) ==== The forecast year validation checks for traffic assignment should concentrate on comparisons of the forecast year model results to the base year model results. The base year observed data are no longer directly considered. Unlike the base year comparisons, however, the objective is not to achieve a close match between the forecast and base year results, but rather to ensure that the differences and trends are reasonable. For example, it may be reasonable to expect that VMT per capita increases somewhat over time especially in congested regions due to increased circuitry of travel.

The main comparisons are similar to the comparisons previously done between base year model results and observed data. These may include regional, subregional, and corridor specific checks. Examples of regional and subregional checks include:

  • VMT per capita;
  • Total VMT by functional class;
  • Average congested speeds by functional class;
  • Changes in VMT by functional class; and
  • Changes in volumes crossing screenlines, cutlines, and cordon lines.

Examples of corridor-level checks include:

  • Difference plots of future versus base year traffic; and
  • Comparisons of speeds on facilities.

Traffic for specific facilities should not always be expected to increase. Facilities that are congested in the base year may not be able to handle significantly more traffic in the future or capacity improvements or new roadways in other areas might minimize increases in traffic on specific facilities.

# Transit Assignment Checks

Traditional transit assignment procedures have focused on the assignment of peak and off-peak period trips in production-attraction format in an effort to reproduce daily transit boardings by line and, in many cases, the ridership at maximum load points along the line. For regional travel forecasts, the output of transit assignments may be somewhat less rigorous than the requirements for traffic assignments. The regional assignments have been used to determine information such as the number of transit vehicles required based on the frequency of service required to serve the forecast transit demand.

The amount of scrutiny received by transit assignments may increase substantially when a region applies for FTA Section 5309 New Starts funds. The FTA encourages rigorous checking of transit networks, transit path-building procedures, and transit assignment results.

Generally available transit assignment procedures include all-or-nothing, all shortest paths, and a number of multipath assignment procedures. The multipath transit assignment procedures are heuristic procedures used to represent the optional path choices and path use in robust transit systems rather than transit path-switching due to capacity constraints. Capacity constrained transit assignment techniques are rarely required. Few regions reach crowding to an extent that people switch from one transit path or mode to another to avoid the over-crowding. In cases where this does occur, ad hoc techniques such as "shadow pricing at park and ride lots" are used to "move" transit ridership to different lines.

Some regions developing activity-based travel models have moved from peak and off-peak transit assignments in production-attraction format to true time-of-day assignments in origin-destination format. This change can be considered to be evolutionary; not revolutionary. The transit assignment validation checks for origin-destination-based transit assignments are similar to those used for more tradition transit assignments in production-attraction format.

As discussed in Mode Choice , transit assignment validation is closely related to mode choice model validation, as it regards transit mode choices. Issues identified during checks of transit assignment results may be caused by issues with the mode choice model, and vice versa, and issues with both model components may be related to transit path building and network skimming procedures.

# Sources of Data (Transit)

The primary source of data for transit assignment validation is the transit operator. The most generally available data are count data such as boardings by line and park-and-ride lot utilization counts. Some regions will also have on-board survey data available for validation.

# Boarding Count Data

Most transit operators collect boarding count data by line on a continuous basis through the use of recording fare boxes or the performance of periodic counts. In some cases, the data may be available by time-of-day. Route-level boarding count data can be easily aggregated by mode or by corridor.

Some transit operators are installing Automated Passenger Counters (APCs) on their transit vehicles or perform periodic "boarding and alighting" counts on transit lines. If both the numbers of boardings and the numbers of alightings by transit stop are available, route profiles can be constructed for the lines. Detailed boarding and alighting data at bus stops also provide the means for developing route profiles; screenline, cutline, and cordon line counts; and estimates of passenger-miles of travel (PMT).

# Park-and-Ride Lot Utilization

Regions that have an established park-and-ride system may collect parking lot utilization data for the various lots. The data collected may range from number of spaces used on a daily basis to the number of vehicles parking at the lot on a daily basis to license plate surveys of parking lots. Vehicle counts at park-and-ride lots are superior to counts of used parking spaces since the vehicle counts provide a clearer picture of park-and-ride lot demand.

# Transit Rider Survey Data

Transit rider survey data, often collected as on-board survey data, provide a wealth of information for detailed transit assignment validation including transfer rates, numbers of linked trips, and access and egress modes. Survey data that represent all transit service in the modeled region provide the information necessary to develop "observed" transit trip tables. The development of the observed trip tables requires careful expansion of the on-board survey data to match boarding counts.

When observed trip tables are available, it is possible to focus the validation on the actual transit assignment procedures since the validation will not be impacted by the veracity of the trip tables produced by the rest of the modeling process. This is in contrast to the traffic assignment validation process where it is not possible to collect sufficient data to develop observed auto trip tables for a general validation of the traffic assignment process.

Transit operators who have received FTA New Starts funding are required to perform on-board surveys before and after the construction to determine who is using and benefiting from the new system. The availability of before and after trip tables provides unique data for transit assignment validation, including the ability to perform temporal validations.

# Other Data

In some areas, operators may have other useful data available, especially where automated passenger counting or fare collection is performed. For example, transit systems that use "smart cards" or similar technology may have information on boarding and alighting stations of passengers, which could be compared to model results.

# Aggregate Checks (Transit)

Aggregate data checks may be performed using trip tables resulting from the modeling process through mode choice and, possibly, time-of-day modeling, or using transit trip tables from comprehensive on-board survey. If observed trip tables are available, tests should be performed using those tables since differences between modeled and observed transit validation measures can be more fully attributed to the transit networks and transit assignment process.

# Boarding Count Checks

Most aggregate transit assignment checks begin with the comparison of modeled to observed transit boardings. In addition to total system boardings, these comparisons may include boardings by line and by mode. The checks may be performed by time-of-day. Validation checks typically consist of comparing absolute and relative differences between modeled and observed boardings by line. Since most regions have relatively few transit lines, checks by line are typically reported for each line. The reports may be stratified by percent difference to facilitate diagnosis of transit assignment problems.

Comparison of modeled to observed boardings at major transfer points provides another set of validation checks. The major transfer points may include park-and-ride lots, fixed guideway transit stations (e.g., light-rail stations), and bus transit centers or "pulse-points."

The assignment of an "observed" transit trip table (based on expanded data from a transit rider survey) can be valuable in providing an "in-between" data point for transit assignment validation. If the modeled boardings resulting from the assignment of the "observed" transit trip table match the observed boardings reasonably well, but the modeled boardings resulting from the assignment of the transit trip table from the mode choice model do not match up well with the observed boardings, issues with the mode choice model (or preceding models such as trip distribution) may be indicated. If the results from assignments using both trip tables ("observed" and from the mode choice model) match each other well but not the observed boardings, there may be issues with the transit network or path building procedures (although checks of the observed data, boardings and transit survey, should also be performed).

# Boarding- and Alighting-Based Checks

If detailed boarding and alighting data are available, it is possible to construct observed transit route profiles such as the example shown in Figure 9.10 . This information provides the means to compare modeled to observed volumes along transit lines. Modeled line profiles may be compared to observed profiles for selected lines.

Modeled PMT for the region, by line, by mode, by access mode, or by time-of-day can be compared to observed PMT when detailed boarding and alighting counts are available.

# Transit Rider Survey-Based Checks

If a transit rider survey is available, the regional transfer rate or boardings per linked trip can be estimated. This information can also be estimated from boarding counts provided the operator provides transfers and records boardings by fare payment type. Modeled boardings per linked trip can be estimated from the transit assignment results. As with previous aggregate checks, this comparison can be made based on the assignment of observed transit trip tables or based on the assignment of modeled trip tables.

# Disaggregate Checks (Transit)

The following checks must be performed using data collected in a comprehensive transit rider survey. The following checks are not truly disaggregate as defined for discrete choice models, but are substantially more detailed than the aggregate checks described above. In effect, these checks involve comparisons of transit paths reported by travelers in the survey to modeled paths. The disaggregate checks are based on the analysis of individually reported transit trips rather than the assignment of an observed transit trip table for the region.

The reported trips should be compared to transit paths build using procedures consistent with the transit assignment process. For example, if transit trips using walk access are forecasted by the mode choice model and assigned separately for local bus and rail, individually reported trips for travelers using walk access to local bus only should be compared to the modeled walk access to local bus transit path information. Likewise, the individually reported trips for travelers using rail in their transit trip should be compared to the modeled walk access to rail transit path information. Conversely, if the mode choice model forecasts only total walk access trips, the individually reported trips for all travelers using walk access should be compared to the modeled walk access to transit path information. In this case, it might be worthwhile to check the prediction success of boardings by mode (e.g., local bus and rail for this example) rather than total boardings on the interchanges.

Comparison of modeled to reported transit paths can be used to prepare prediction success tables of the transit path-builder and path-building parameters used for the assignment process. While modeled paths could be compared to reported paths and the results summarized in "pass-fail" form, such an approach could be extremely time consuming. The process can be automated to summarize key variables. Table 9.5 is an example of a prediction success table for modeled to reported boardings on individual transit paths and Table 9.6 shows a summary of the results.

Table 9.5 Example Prediction Success Table for Transit Assignment

| | No Path

| On-Board Survey Reported Boardings

Table 9.6 Example Prediction Success Table Summary for Transit Assignment

| | Equal Reported Boardings

| Walk Access | 854 | 67% | | Drive Access | 424 | 67% | | All Trips | 1,278 | 67% |

# Criteria Guidelines (Transit)

The same caveat regarding setting guidelines for aggregate traffic assignment validation checks can be made for aggregate transit validation checks. Setting guidelines is double-edged sword that may lead to over-manipulation of transit assignment procedures. Consequently, this chapter reports some guidelines that have been used by various states and agencies. The guidelines in this section should not be construed as standards; matching or exceeding the guidelines is not sufficient to determine the validity of a model.

It should be noted that the FTA does not specify guidelines for the New Starts program other than that the overall modeling process should "tell a coherent story." The FTA focus is on reasonable reproduction of the transit network and transit travel times and reasonableness of predicted changes between current and future ridership coupled with reasonableness of changes between future base and future build alternatives.

What is being validated must be considered. If observed trip tables from a comprehensive transit rider survey are being assigned and used as a basis for the validation, much more emphasis is being placed on transit assignment procedures (although there is some consideration of the veracity of the "observed" trips tables and expansion factors). In this case, a "tight" validation might be desired. Alternatively, if a modeled trip tables from mode choice and transit time-of-day processing is being assigned to provide the modeled transit boardings and transit flows for validation, the validation actually covers the entire modeling process to that point in addition to the validation of the transit assignment process. In this case, desired criteria might be less stringent.

Example transit assignment validation results for several areas are shown in Tables 9.7 through 9.10 . Tables 9.9 and 9.10 show example transit screenline validation results and guidelines.

PMT for transit assignment is analogous to VMT for traffic assignment. As a result, any regional VMT guideline set for traffic assignment results might be used for regional modeled PMT to observed PMT. For example, if regional guidelines suggest that regional VMT be within ±5 percent of the observed VMT, the same guideline might be considered for the transit assignment.

There are no specific criteria guidelines associated with disaggregate transit assignment checks described above.

Table 9.7 Example Transit Validation Results for Sacramento Region

| Transit Linked Trips

| Walk (RT Only)

| Drive (RT Only)

| RT Subtotal

| '''Other Bus ( b , c )

| Transit Boardings by Bus/LRT

| '''Other Bus ( b )

| LRT Boardings (By Access Mode at Production End of Trip)

| Total LRT Boardings

| Bus Boardings (By Access Mode at Production End of Trip)

| Transfer (RT Only)

| Total Bus Boardings

Sources: DKS Associates, 2002; and Sacramento Regional Travel Demand Model Version 2001 (SACMET 01) , prepared by DKS Associates for Sacramento Association of Governments, prepared by DKS Associates, March 8, 2002, Table 43.

Table 9.8 Example Transit Validation Results for Seattle Region

| King County Metro | 92,940 | 77,627 | | Pierce Transit | 9,987 | 11,440 | | Community Transit and Everett Transit | 10,070 | 7,662 | | Kitsap Transit | 4,403 | 3,967 | | Washington State Ferries | 11,372 | 2,114 | | Sound Transit | 10,006 | 8,900 | | Total | 138,778 | 111,710 |

Source: PSRC Travel Model Documentation (for Version 1.0) – Updated for Congestion Relief Analysis , prepared by Cambridge Systematics, Inc., for Washington State Department of Transportation and Puget Sound Regional Council, September 2007.

Notes: Observed boardings are from the National Transit Database (NTD). Sound Transit boardings were reported in NTD under other operators, King County Metro, Pierce Transit, and Community Transit.\

Table 9.9 Example Transit Assignment Validation Guideline for State of Florida

| | Acceptable

| Regional Estimated-over-Observed Transit Trips (Boardings) | ± 9% | | Transit Screenlines | ±20% | | Transit Line Ridership: <1,000 Passengers/Day | ±150% | | Transit Line Ridership: 1,000-2,000 Passengers/Day | ± 100% | | Transit Line Ridership: 2,000-5,000 Passengers/Day | ± 65% | | Transit Line Ridership: 5,000-10,000 Passengers/Day | ± 35% | | Transit Line Ridership: 10,000-20,000 Passengers/Day | ± 25% | | Transit Line Ridership: >20,000 Passengers/Day | ± 20% |

Source: FSUTMS-Cube Framework Phase II – Model Calibration and Validation Standards: Model Validation Guidelines and Standards , prepared by Cambridge Systematics, Inc., for Florida Department of Transportation Systems Planning Office, December 31, 2007.\

Table 9.10 Example Transit Screenline Results for Seattle Region

| 132nd SW, Snohomish County | 5,825 | 6,883 | 1,058 | 18% | | Snohomish County Line West | 10,590 | 11,449 | 859 | 8% | | Snohomish County Line East | 2,010 | 1,582 | -428 | -21% | | Ship Canal Bridges | 65,970 | 56,160 | -9,810 | -15% | | Lake Washington Bridges | 20,670 | 21,999 | 1,329 | 6% | | Newport Eastside | 3,430 | 4,948 | 1,518 | 44% | | South Spokane Street | 60,100 | 32,347 | -27,753 | -46% | | West Seattle Bridges | 21,500 | 20,752 | -748 | -3% | | South 188th Street, King County | 21,170 | 10,703 | -10,467 | -49% | | Pierce County Line | 6,860 | 4,780 | -2,080 | -30% | | 40th Street, Tacoma | 9,300 | 2,544 | -6,756 | -73% | | Eastside, North of I‑90 | 9,850 | 3,916 | -5,934 | -60% | | Eastside, East of I‑405 (E-W | 2,760 | 2,258 | -502 | -18% | | Eastside, North of Kirkland | 8,100 | 6,602 | -1,498 | -18% | | Eastside, North of Renton | 2,630 | 3,209 | 579 | 22% | | South King County (E-W Movements) | 10,260 | 3,433 | -6,827 | -67% | | Subtotals | | | | | | King County – Seattle | 199,670 | 145,394 | -54,276 | -27% | | King County – Eastside | 26,770 | 20,932 | -5,838 | -22% | | Pierce County | 16,160 | 7,324 | -8,836 | -55% | | Snohomish County | 18,425 | 19,914 | 1,489 | 8% | | All Screenlines | 261,025 | 193,564 | -67,461 | -26% |

Source: PSRC Travel Model Documentation (for Version 1.0) – Updated for Congestion Relief Analysis , prepared by Cambridge Systematics, Inc., for Washington State Department of Transportation and Puget Sound Regional Council, September 2007.\

# Reasonableness and Sensitivity Testing (Transit)

Perhaps the best reasonableness test that can be applied to transit assignment results is the application of the "tell a coherent story" philosophy to the transit assignment. In effect, the transit assignment process should "tell a coherent story" regarding how transit riders behave. Beyond that suggestion, there are several reasonableness checks that can be made:

  • Are the transit path-building parameters used for the transit assignment consistent with the mode choice model coefficients?
  • Does the number of boardings per linked trip (or transfer rate) make sense? Boardings per linked trip are typically in the range of 1.2 to 1.6 with the higher rates in regions with grid-based bus systems and fixed guideway transit modes (e.g., light rail, heavy rail, or bus rapid transit).
  • Do maximum load point locations make sense (even if observed locations for maximum load points are not available)? For example, maximum load points for radial transit lines focused on a central business district or some other major generator should be reasonably near the major generator. For cross-town routes, the maximum load point should probably be closer to the central portion of the route.

Sensitivity testing of transit assignment procedures can be performed by making changes to the networks or input trip tables used for assignment. Some approaches include:

  • '''Regional sensitivity – '''Check the reasonableness in changes in total boardings to changes in total trips. Increase trips by a factor (e.g., 1.5) and check to see that total boardings change by a similar factor.
  • '''Localized sensitivity – '''Modify speeds or headways on selected routes and observe the changes in boardings (especially in areas where there is "competition" among transit routes). Do faster or more frequent routes attract more riders? Remove routes and observe change in ridership on other routes.
  • '''Mode sensitivity – '''If walk to rail (or walk to premium transit) is assigned separately from walk to local bus, increase rail trips on specific interchanges that must use background bus to access rail by a known number of linked trips. Verify that rail boardings increase by at least (or exactly) the increase in the number of linked trips.

# Troubleshooting Strategies (Transit)

Transit assignment, like traffic assignment, is the culmination of the modeling process. As a result, issues can easily be related to previous steps in the modeling process. However, unlike traffic assignment, it might be possible to isolate transit assignment issues to the transit assignment process if an observed transit trip table from an on-board survey is available. Table 9.11 provides some troubleshooting strategies for common issues that might occur with a traffic assignment. Also refer to Table 7.6 , which presents the analogous strategies for the mode choice model.

Table 9.11 Troubleshooting Strategies for Issues with Transit Assignment

| 1. Low or high boardings/ridership compared to route/stop boardings | - Check network coding (stops, etc.) on the affected routes/stops, nearby/adjacent routes, and competing routes - Check transit access links - Check run times, speeds, and/or dwell times for routes - Check level of zonal resolution and transit walk access percentages - Check trip tables for consistency between trips in corridor and observed boardings - Modify path-building/assignment parameters - If using multi-path assignment procedures, investigate changes in route "combination" factors - Investigate changes to transfer penalties - Investigate changes to relationships between wait time, out-of-vehicle time, in-vehicle time, and transit cost | | 2. Low or high boardings per linked trip | - Review walk network assumptions - Investigate changes to transfer penalties - Modify assignment procedures - Increase market segmentation - Modify path-building/assignment parameters - If using multi-path assignment procedures, investigate changes in route "combination" factors - Investigate changes to transfer penalties - Investigate changes to relationships between wait time, out-of-vehicle time, in-vehicle time, and transit cost |

# Forecast Checks (Transit)

Certain basic statistics such as the number of boardings per linked trip and PMT per linked trip should remain relatively constant between the base year and a future year unless, of course, the transit system has been modified in such a way that it directly impacts one of those statistics. For example, the introduction of light-rail transit in a region would probably increase the boardings per linked trip and the introduction of commuter rail might increase the PMT per linked trip.

The FTA has suggested a number of checks that should be used when producing ridership forecasts for a Section 5309 New Starts analysis, but these suggestions would be applicable for any future transit assignment (regardless of whether it is for a New Starts project). Figures 9.11a and 9.11b summarize the FTA suggestions for forecast checks.

Figure 9.11 FTA New Starts-Based Forecasting Checks

(a)Demonstrating Reasonable Predictions of Change

* Models should provide reasonable predictions of change

  • Between today and a future no-build condition
  • Between a future no-build condition and a realistic alternative (i.e., a change in the transportation system)
  • Findings can highlight problems not prevalent in base year conditions

(b)Common Tests for Reasonable Forecasts

| Compare model results from…

| 1 | Previously validated year | Base year validation | Past to the present | Changes in demographics and employment and transportation supply | | 2 | Base year validation | Future year no-build | Present to the future | Demographic and employment forecasts | | 3 | Future year no-build | Future year TSM | The future to a modestly-changed future | Transportation supply (modest) | | 4 | Future year TSM | Future year Build | The modestly-changed future to a future with a big project | Transportation supply (major) |

(opens new window) , accessed October 5, 2009.

This site uses cookies to learn which topics interest our readers.

  • Copyright Policy and Terms of Use

Pharma Beginners

Process Validation SOP and Protocol

  • Post author: pharmabeginers
  • Post published: December 21, 2019
  • Post category: cGMP / Protocol / QA Sop / SOPs / Validation
  • Post comments: 5 Comments

Process Validation: Establishing documented evidence through collection and evaluation of data from the process design stage to routine production, which establishes scientific evidence and provides a high degree of assurance that a process is capable of consistently yield products meeting pre-determined specifications and quality attributes.

SOP and Protocol for Process Validation of Drug Product

1.0    purpose:.

  •  The purpose of this procedure is to provide a high degree of assurance of meeting all the predefined attributes and the process is capable of consistently delivering a quality product.

Visit to copy this SOP

2.0    SCOPE:

  • This SOP is applicable for Process Validation / Qualification activities of drug products.

3.0    REFERENCES:

  • Protocol & Report Numbering And Issuance System SOP.
  • SOP for Change Control Procedure
  • Planned Modification System SOP

4.0    RESPONSIBILITY:

  • Quality Assurance Dept. shall responsible for preparation of process validation protocol, collection of process validation sample, and preparation of Process Validation report.
  • Quality Control Dept. shall responsible for the analysis of the Process Validation sample and provide the data for the process validation report to Quality Assurance Dept.
  • Maintenance Dept. shall responsible for preventive maintenance and calibration of equipment and instruments respectively.
  • QA Head shall review & approved process validation protocol, approve validation report for its completeness and correctness with respect to all data and report, and to ensure implementation of SOP.

5.0    PROCEDURE

Process validation methodology.

                     This guidance describes the process validation activities in three stages:

Stage 1 – Process Design:

  • The commercial process is defined during this stage based on knowledge gained through development and scale-up activities.
  • The goal of this stage is to design a process suitable for routine commercial manufacturing that can consistently deliver a product that meets the majority of its quality attributes of activities related to stage -1 shall be performed, suggested by FDD.

Stage 2 – Process Validation:

  • During this stage, the process design is confirmed as being capable of reproducible commercial manufacturing.

This stage shall be done in two parts:

Design of the facility and qualification of the equipment and utilities:.

  • Qualification of utilities and equipment shall be covered under individual plans or as part of an overall project plan.
  • The details of the same shall be mention in the Protocol.
  • Qualification activities must be completed prior to the start-up of the Process Performance Qualification (PPQ) stage.
  • The suitability of equipment and utilities must be documented in accordance with the process requirements in all the anticipated operating ranges.

Process Performance Qualification:

  • During stage 2 and onward, cGMP compliance must be followed.
  • Successful completion of stage 2 is necessary before commercial distribution.
  • Need of training shall be assessed prior to the start-up of PPQ batches.
  • During this stage, the process design is evaluated to determine if the process is capable of consistently manufacturing the product meeting predetermined acceptance criteria.
  • Process for new as well as existing products shall be qualified.
  • Process qualification shall run according to approved protocol detailing sampling, timing, location, procedures alongwith analytical tests and acceptance criteria.
  • Three batches of commercial batch size shall be taken for qualification in accordance to the Process Qualification protocol and BMR.

Stage 3 – Continued Process Verification:

  • Ongoing assurance is gained during routine production that the process remains in a state of control.
  • During this stage, continuous monitoring of process parameters and quality attributes at the level established during the process validation stage shall be done.
  • This stage is applicable for Existing Products, Site Transfer Products, and New Products.

Process Development and Trial Batch Plan:

  • R&D/FDD shall generate knowledge and understanding about the manufacturing process and the product at the development stage.
  • Manufacturing process,
  • Critical process parameters,
  • In-process checks,
  • Specifications for input materials,
  • Intermediate products and
  • Final products.
  • Based on the requirement and risk assessment R&D shall recommend for the trial batch(es) manufacturing prior to commercialization.
  • The batch/lot size of the trial batch shall be decided based on the equipment occupancy level and other scientific rationales so that the data, observation & experience from the trial batch will be useful for preparing the batch record and process validation protocol/report for commercial batches.
  • The trial batch/lot size shall not be less then 1/10 th of the intended commercial batch size, keeping the set of equipment same.
  • Principle of operation shall be identical.
  • Prepare a trial batch report as per the Annexure 4.
  • The trial batch report shall be duly signed by the Production, QA, R&D, Manufacturing head and Quality head and shall be retained with QA for reference.
  • Photocopy may attached to the relevant BMR.
  • Based on the trial batch report & recommendations, Prepare the commercial batch manufacturing record & process validation protocol and Initiate the commercial batch manufacturing.
  • R&D shall revise and send the MPS to the site prior to post validation BMR revision, if any revision is recommended /identify during execution of process validation batches.

Process Validation Pre-Requisites :

                    Prior to Process Validation study. Complete the following prerequisite activities .

  • Qualify the Manufacturing Equipment and utilities to meet cGMP requirements.
  • Qualification of Utilities to be used in manufacture of the product (example – purified water, compressed air, HVAC system etc.) .
  • Calibrate the Instruments used in processing (example – weighing balance. vernier calipers, hardness tester etc.).
  • Validate the Analytical methods for in process testing and finished product analysis .
  • Train appropriately the personnel involved in manufacturing and testing of process validation batches .

Process Validation Protocol:

                  On satisfactory completion of pre requisite activities, Prepare the process validation protocol as                                          described below.

  • QA shall prepare the Process Validation protocol  as per Annexure-3.
  • This protocol shall applicable for both commercial as well as trial batches.

In case of process validation protocol of clinical trial batches,

  • FDD shall review the Protocol / report and approve prior to execution at site,
  • Approval of protocol can ensure by additional signature on same protocol as a proof or.
  • Attach any supporting communication  to the respective clinical trial batch process validation protocol.
  • Production and QC shall review the Process Validation Protocol.
  • Head QA shall approve the protocol.
  • The Process Validation team members of different departments (Production, QC& QA) shall review Process Validation Protocol for the correctness of their relevant matters.
  • Designated person from Production shall ensure the suitability of the equipments listed in the protocol;
  • Whether the range and set point of process parameters is in line with measuring device available on the respective equipment / instrument;
  • Whether the equipment and measuring instruments are in calibrated status.
  • The correctness of carried QC tests at different process stages and availability of required testing methodology .
  • Process description,
  • Flow chart,
  • Critical process parameters, their range and set point,
  • Input material, their quantities,
  • Names of vendors stated in the protocol.
  • Process flow,
  • List of equipment, equipment with qualification status,
  • List of input materials,
  • In-process checks are correct;
  • Sampling plan is adequate to assess the capability of the process to consistently produce product meeting required specifications.

Execution of Process Validation :

  • Execute a minimum of three consecutive batches against the approved BMR and the Process validation protocol.
  • Where multiple batch sizes are available for a product, Perform the PV for each batch size.
  • However PV plan can restrict to only those unit processes that are evaluated to have impact due to difference in batch size.
  • For example if there is no change in lot size at Granulation stage and only number of lots increased,
  • Perform the PV of only Blending operation and decide the extent of validation study of other stages  based on the risk/impact assessment.

Where a product is manufactured in multiple strengths using a common blend,

  • Employ a matrix approach for PV.
  • For example if a product is manufactured as 10 mg, 20 mg and 40 mg by a common blend,

                              Then the PV can include validation up to blend stage with three batches of common blend and                                           validation of subsequent unit processes like compression, coating etc. with three batches each                                           strength.

  • In such cases number of batches of different strength may reduce with appropriate justification and necessary approval from Customer / Regulatory agency.
  • Collect the samples as per sampling plan defined in the PV protocol & tested in QC and PV team shall obtain the results to compiled for evaluation by the PV team.
  • Capture the values of critical process parameters noted during in-process of the PV Batches as per Annexure-5 (applicable for both commercial as well as trial batches)
  • The variations in the critical process parameters in lot to lot/batch to batch shall justify with scientific logic and shall capture in batch manufacturing record as well as PV.  

At compression stage Challenge study performs like

  • Hopper study,
  • Speed study and
  • Cycle study ranges shall perform for the minimum, optimum and maximum ranges and Record in the attachment of respective batch number.
  • Checking of results from testing of in-process samples, intermediate product and final product of the PV Batches by QC person for correctness and compliance to respective acceptance criteria.
  • Same shall checked by QA person independently.
  • Variability ‘within’ a validation batch shall assess by QA by comparing the results of samples drawn from various locations / different intervals using the Relative Standard Deviation criteria pre-defined in the protocol.
  • Likewise, QA shall assess the variability ‘between’ Validation Batches by comparing the process parameters and test results of each batch at every stage of testing with the other PV Results.
  • Investigate significantly different to determine the cause of variability.

Process Validation Report

  • QA shall prepare the process validation report by compilation of BMR data and QC analytical report as per Annexure 4
  • Record all data during execution of process validation batches as per Annexure 5 for each batch.
  • Preparation of the interim report first, second and third after completion of manufacturing and packing process of respective batches.
  • QA shall prepare the interim report and  reviewed by Production department and approved by QA Head.
  • Compile all finished product result as per acceptance criteria and Attach with reports.
  • Any change control/events observed during processing of PV batches shall handle as per Change control procedure and event SOP respectively.
  • Prepare a process validation summary report after completion of PV batches as per Annexure 6.
  • QA shall prepare Process validation summary report and reviewed by Production and approved by QA Head.
  • Share the approved Process Validation summary report with production department to freeze all the critical process parameters and revise the BMR.

Release of Process Validation batches for distribution

  • Successful completion of PV activity and review, approval and signing off the PV interim report with supporting raw data.
  • Ensure no impact on product quality prior to release of each PV batch.

Re -Process Validation criteria:

  • In case of any change in facility.
  • In case of any change in manufacturing equipment.
  • Change in vendor of Active Pharmaceutical Ingredient
  • Change in batch size
  • Based on associated risk and impact analysis the extent of PV shall  decide which may include the entire process that is impacted.

For introduction of a New product in the facility:

  • Initiate a permanent change control by the user department and approved as per SOP  for Change Control Process.
  • After assessment of all the possible impacts. Initiate the manufacturing of PV batch along with simultaneously the risk assessment report.
  • Take initial three consecutive batches of new product for PV.
  • Identify all the critical process parameters in the protocol for the particular product and Manufacture the batch by referring the tentative limit as provided in MPS.

                  Note:-

  • Consider the tentative limits of  critical process parameter and their control limit mentioned in the MPS .
  • Record if any deviation observed during the manufacturing process in the respective BMR.
  • QA/ FDD representatives shall verify such deviations and write the appropriate remark in the concern page of BMR.
  • In case more parameters of any stage needs to established.  Attache an addendum to the concern pages with sign and date of Production, QA and FDD representatives.
  • Establish parameters which are indicative and during PV shall established /freezed after successful completion of PV
  • The actual reading obtained during wet granulation is likely to vary from the limit mentioned from the MPS.
  • Similarly the limits provided in MPS for Hardness/thickness/ yields are indicative only and need to establish during PV.
  • Establish all identified Tentative/ indicative established parameters highlighted by following Good Documentation Practices like application of Symbols and putting remarks against the symbol on the same page.
  • Perform the challenge study during process validation batches, which shall include Full Hopper, Low Hopper, Low RPM, High RPM, Low Hardness, High Hardness and .
  • Perform the impact of challenge study for minimum 30 minutes or based on risk assessment, studied on final product.

Update the processing parameter based on findings.

  • Perform the challenge study at the start of the compression operation after initial machine setting verified by QA.
  • QA shall prepare the protocol for PV and carryout sampling and testing of physical parameter as per the approved protocol.
  • QA shall verify the parameter and record in the protocol/BMR.
  • To monitor and record the NRR details at various stages (Granulation & Compression) of manufacturing in case of Bilayer Tablet as per Annexure 7.
  • Requirement of any of the PV batches shall decided by FDD/QC/QA and shall part of PPQ.
  • Based on product, process, technical criticality, Adopt the reduced sampling plan and Mention the details in the sampling plan of respective protocol.
  • Collect the sample for chemical analysis and sent to QC by QA.
  • After completion of the analysis, QC shall submit the analytical reports to QA and QA shall prepare the PV report.
  • QA Head and Quality Head shall approve the Process Validation report.
  • Revise the BMR and BPR after completion of process validation report.
  • QA shall maintain status of process validation batches of new product and existing product as per given Annexure 2.

6.0    ABBREVIATIONS:

  • cGMP :           current Good Manufacturing Practices
  • FDD :           Formulation and Development Department
  • HVAC :           Heating Ventilation and Air Conditioning
  • MPS :           Master Product Specification
  • PMS :           Planned Modification System
  • PPQ :           Process Performance Qualification
  • PVMP :           Process Validation Master Plan
  • RPM :           Rotation per minute
  • R&D :           Research and Development
  • SOP :           Standard Operating Procedure
  • VMP :           Validation Master Plan

7.0    DEFINITION:

Manufacturing process:.

  • Transformation of starting materials into finished products through a single operation or a sequence of operations involving processing equipment, environmental control, personnel and documentation.

Validation :

  • Validation is the documented act of proving that
  • Any procedure,
  • Activity or system actually leads to the expected result.

Product life cycle:

  • Stages through which a product moves from its inception till its discontinuation. It includes pharmaceutical development. technology transfer and commercial production up to product discontinuation.

8.0    ANNEXURES:

Flow chart of validation approach (annexure 1), status of process validation batches  (annexure 2), protocol template (annexure 3).

Cover Page :

Note : This protocol can be customized as per the product, process, technology involved in the processes of any product.

  • Table of Contents
  • The purpose of this protocol is to establish documented evidence, which will provide a high degree of assurance that the adopted manufacturing process methodology for the product ………………… is capable of providing consistent and reproducible result as per the pre-defined specification & its quality characteristics/attributes
  • This protocol is applicable for Process Qualification of …………, batch size …….. Lac /…………..Kg to be manufactured  at _____________________________________;
  • Responsibility

Quality Assurance Department

  • Prepare, review, approve and execution of protocol.
  • Provide training to concerned personnel.
  • Withdraw the samples as per the sampling plan.
  • Monitor validation activities.
  • Review the validation data, and
  • Provide the final conclusion of the Process qualification in the reports

Production Department

  • Involve trained personnel in manufacturing activities.
  • Perform processing of the batch as per the batch manufacturing records.
  • Ensure that qualified equipment’s are used.
  • Assist Quality Assurance department in withdrawal of sample.
  • Review process qualification protocol

Quality Control Department

  • Perform testing and analysis in support of the validation activity.
  • Generate and compile the analysis data.
  • Review the process qualification protocol.

Maintenance Department

  • Ensure that all processing equipment and instruments are in a state of qualified and calibrated.
  • Provide proper utility services during the processing of qualification batches.

       5.0    Training Record

  • Training shall be imparted to all concerned personnel up to the operator level involved prior to execution of this protocol.

        6.0 Abbreviations

  • _____________________________________________________________________

         7.0 Validation Approach

  •   Product ……………….. (Each Film coated tablets contain ………………………… mg) is a new product at this location, the batch size is determined ……… lacs /……… kg, this product will be manufactured in granulation area….. This Product contains step like …………………………
  • Thus to validate the manufacturing process, three consecutive batches will be considered and sample shall be collected at appropriate stage as per sampling plan. The equipment set will be remained identical for all three validation batches.

      8.0 Process flow chart

     9.0   List and quantity of Raw Materials ( Batch Size ………Lacs / ……. kg)

     10.0 List and quantity of Packing Materials ( Batch Size …… lacs)

    11.0   List of Equipment’s used in Manufacturing & Packing and Its Qualification Status

       12.0 Key Process Parameter and its Control

  • Following key parameters are required to be validated for the manufacturing of  Product Name

        13.0  Sampling Plan and Acceptance Criteria 

     14.0 selection of batches.

  • Three consecutive batches shall be selected for process qualification having same / identified set of equipment

       15.0 Deviations/Incidents

  • If any deviation or incident observed in the process qualification batches shall be discussed and resolved as per SOP  and shall be recorded in the process qualification report.

      16.0  Change Control Management

  • If any change observed in the process qualification batches shall be allowed only through Change control Management procedure and shall be recorded in the process qualification report.
  • Change control not required for indicative parameter.

      17.0  Re-validation criteria

  • Change in Manufacturing Process
  • Process Parameter changes
  • Change in make, model or capacity of the equipment
  • Batch size change
  • Change in the Vendor of API& excipients
  • Every five years if no change in process, equipment’s, raw materials and composition

       18.0 Validation Report

  • Validation report shall be prepared by compiling the data obtained from three consecutive batches and a conclusion shall be drawn.
  • The data generated during the qualification activity shall be attached with the process validation report.

       19.0  Diagram of RMG & Occupancy of material in RMG

         20.0 Diagram of FBD Bowl & Bunker

       21.0  Diagram of Hopper

        22.0 Approval of Protocol

  • This protocol has been studied for adequacy and approved by the following responsible personnel

Report Template (Annexure 4)

  • Refer above (Annexure 3 ) with filled data

In process  Sheet (Annexure 5)

Attachment of In process sheet – click to  Download

Also visit : Technology Transfer of Drug Product

pharmabeginers

You might also like, environmental monitoring guide – non sterile facility, sterilizing and depyrogenating tunnel – pq protocol, stability study sop as per ich guideline, this post has 5 comments.

Pingback: Continued Process Verification Guideline & SOP - Pharma Beginners

Pingback: SOP for Drug Product Recall & Mock Recall - Pharma Beginners

Pingback: SOP for Protocol and Report Numbering System - Pharma Beginners

Pingback: Cleaning Validation master plan (CVMP)-New Approach - Pharma Beginners

Pingback: Document Management System in Pharmaceuticals - Pharma Beginners

Leave a Reply Cancel reply

You must be logged in to post a comment.

Processes, auto-response rules, validation rules, assignment rules, and workflow rules

Join the  Getting Started Community : Ask questions. Get answers. Share experiences. Create a workflow rule - Get step-by-step help on setting up a workflow rule   and be aware of the  Workflow Limits . Time triggers and time-dependent considerations - Learn how to create a time-based rule . For more specific content on workflow rule migration on Trailhead, please review  Workflow Rule Migration . Process Builder limits and considerations - Before you get started with processes,  make sure you understand the limits and considerations .   You may also find videos on YouTube.  Want even more information? If you've got a real thirst for knowledge, the Salesforce Trailhead will be what you're looking for. These in-depth resources let you pick your level and can be done on your own time. To help you determine which automation to use, you may check out this video ! See also Validation Rules Set up Assignment Rules Set Up Auto-Response Rules Tips for Working with Picklist and Multi-Select Picklist Formula Fields Time triggers and time-dependent considerations  

Company Logo

Cookie Consent Manager

General information, required cookies, functional cookies, advertising cookies.

We use three kinds of cookies on our websites: required, functional, and advertising. You can choose whether functional and advertising cookies apply. Click on the different cookie categories to find out more about each category and to change the default settings. Privacy Statement

Required cookies are necessary for basic website functionality. Some examples include: session cookies needed to transmit the website, authentication cookies, and security cookies.

Functional cookies enhance functions, performance, and services on the website. Some examples include: cookies used to analyze site traffic, cookies used for market research, and cookies used to display advertising that is not directed to a particular individual.

Advertising cookies track activity across websites in order to understand a viewer’s interests, and direct them specific marketing. Some examples include: cookies used for remarketing, or interest-based advertising.

Cookie List

The Four Types of Process Validation

Process validation is defined as the collection and evaluation of data, from the process design stage throughout production, which establishes scientific evidence that a process is capable of consistently delivering quality products.

Process validation is a requirement of current Good Manufacturing Practices (GMPs) for finished pharmaceuticals (21CFR 211) and of the GMP regulations for medical devices (21 CFR 820) and therefore applies to the manufacture of both drug products and medical devices.

assignment process & validations

Process validation involves a series of activities taking place over the lifecycle of the product and process.

The U.S. Food and Drug Administration (FDA) has proposed guidelines with the following definition for process validation: – “PROCESS VALIDATION” is establishing documented evidence which provides a high degree of assurance that a specific process consistently produces a product meeting its predetermined specifications and quality attributes.

The process validation activities can be described in three stages.

Stage 1 – Process Design: The commercial process is defined during this stage based on knowledge gained through development and scale-up activities.

Stage 2 – Process Qualification: During this stage, the process design is confirmed as being capable of reproducible commercial manufacturing.

Stage 3 – Continued Process Verification: Ongoing assurance is gained during routine production that the process remains in a state of control.

Process validation is the bedrock of good manufacturing practice, it’s also the first step to realizing significant time and cost savings in validation. Make sure you’re in position to succeed and drive excellence in your organization by watching our webinar, The Three Stages of Process Validation, with our Process Validation SME, Olivia Calder.

You’ll learn more about:

  • The three stages of process validation
  • How digital validation accelerates process validation  
  • How a leading pharmaceutical company reduced cycle times by 50% with Kneat Gx

Types of process validation

The guidelines on general principles of process validation mentions four types of validation:

  • A) Prospective validation (or premarket validation)
  • B) Retrospective validation
  • C) Concurrent validation

D) Revalidation

A) Prospective Validation (or Premarket Validation)

Establishing documented evidence prior to process implementation that a system does what it proposed to do based on preplanned protocols. This approach to validation is normally undertaken whenever the process for a new formula (or within a new facility) must be validated before routine pharmaceutical production commences. In fact, validation of a process by this approach often leads to transfer of the manufacturing process from the development function to production.

B) Retrospective Validation

Retrospective validation is used for facilities, processes, and process controls in operation use that have not undergone a formally documented validation process. Validation of these facilities, processes, and process controls is possible using historical data to provide the necessary documentary evidence that the process is doing what it is believed to do. Therefore, this type of validation is only acceptable for well-established processes and will be inappropriate where there have been recent changes in the composition of product, operating processes, or equipment.

This approach is rarely used today because it’s very unlikely that any existing product hasn’t been subjected to the Prospective validation process. It is used only for the audit of a validated process.

C) Concurrent Validation

Concurrent validation is used for establishing documented evidence that a facility and processes do what they purport to do, based on information generated during actual imputation of the process. This approach involves monitoring of critical processing steps and end product testing of current production, to show that the manufacturing process is in a state of control.

Revalidation means repeating the original validation effort or any part of it, and includes investigative review of existing performance data. This approach is essential to maintain the validated status of the plant, equipment, manufacturing processes and computer systems. Possible reasons for starting the revalidation process include:

  • The transfer of a product from one plant to another.
  • Changes to the product, the plant, the manufacturing process, the cleaning process, or other changes that could affect product quality.
  • The necessity of periodic checking of the validation results.
  • Significant (usually order of magnitude) increase or decrease in batch size.
  • Sequential batches that fail to meet product and process specifications.
  • The scope of revalidation procedures depends on the extent of the changes and the effect upon the product.

Talk to an Expert

Kneat supports any of your validation needs with a purpose-built platform that digitizes the entire validation life cycle for greater speed and accuracy, improved transparency, and guaranteed data integrity compliance.

We’ve reduced cycle times by over 40% for eight of the world’s top ten pharmaceutical companies . See how you can experience the same value, book your personal demo today.

Sign up to our Newsletter

Find out how Kneat can make your validation easier, faster, and smarter. Start your paperless validation revolution by speaking to our experts.

  • Search Menu
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Urban Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Literature
  • Classical Reception
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Greek and Roman Papyrology
  • Greek and Roman Archaeology
  • Late Antiquity
  • Religion in the Ancient World
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Emotions
  • History of Agriculture
  • History of Education
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Acquisition
  • Language Evolution
  • Language Reference
  • Language Variation
  • Language Families
  • Lexicography
  • Linguistic Anthropology
  • Linguistic Theories
  • Linguistic Typology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies (Modernism)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Religion
  • Music and Media
  • Music and Culture
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Science
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Lifestyle, Home, and Garden
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Politics
  • Law and Society
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Neuroanaesthesia
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Toxicology
  • Medical Oncology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Clinical Neuroscience
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Medical Ethics
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Security
  • Computer Games
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Psychology
  • Cognitive Neuroscience
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business Strategy
  • Business Ethics
  • Business History
  • Business and Government
  • Business and Technology
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic Systems
  • Economic History
  • Economic Methodology
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Natural Disasters (Environment)
  • Social Impact of Environmental Issues (Social Science)
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • International Political Economy
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Political Theory
  • Politics and Law
  • Public Administration
  • Public Policy
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Developmental and Physical Disabilities Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

The Oxford Handbook of Personnel Assessment and Selection

  • < Previous chapter
  • Next chapter >

6 The Concept of Validity and the Process of Validation

Paul R. Sackett, Department of Psychology, University of Minnesota, Minneapolis, MN

Dan J. Putka, Human Resources Research Organization (HumRRO), Alexandria, VA

Rodney A. McCloy, Human Resources Research Organization (HumRRO), Louisville, KY

  • Published: 21 November 2012
  • Cite Icon Cite
  • Permissions Icon Permissions

In this chapter we first set the stage by focusing on the concept of validity, documenting key changes over time in how the term is used and examining the specific ways in which the concept is instantiated in the domain of personnel selection. We then move from conceptual to operational and discuss issues in the use of various strategies to establish what we term the predictive inference , namely, that scores on the predictor measure of interest can be used to draw inferences about an individual's future job behavior or other criterion of interest. Finally, we address a number of specialized issues aimed at illustrating some of the complexities and nuances of validation.

Validity, according to the 1999 Standards for Educational and Psychological Testing , is “the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests” (p. 9). Although issues of validity are relevant to all fields of psychological measurement, our focus in this chapter is on the concept of validity and the process of validation as it applies to the personnel selection field. The chapter has three main sections. The first sets the stage by focusing on the concept of validity, documenting key changes over time in how the term is used and examining the specific ways in which the concept is instantiated in the domain of personnel selection. The second section moves from conceptual to operational and discusses issues in the use of various strategies to establish what we term the “predictive inference,” namely, that scores on the predictor measure of interest can be used to draw inferences about an individual's future job behavior or other criterion of interest. The final section addresses a number of specialized issues aimed at illustrating some of the complexities and nuances of validation.

The Concept of Validity

There is a long history and a sizable literature on the concept of validity. We attempt to highlight a set of important issues in the ongoing development of thinking about validity, but direct the interested reader to three sets of key resources for a strong foundation on the topic. One key reference is the set of chapters on the topic of validity in the four editions of Educational Measurement , which is that field's analog to the Handbook of Industrial and Organizational Psychology . Cureton ( 1951 ), Cronbach ( 1971 ), Messick ( 1989 ), and Kane ( 2006 ) offer detailed treatment of the evolving conceptualizations of validity. A second key set of references focuses specifically on validity in the context of personnel selection. There have been two prominent articles on validity in the employment context published in the American Psychologist (Guion, 1974 ; Landy, 1986 ). There is also a very influential paper by Binning and Barrett ( 1989 ). A third key set comprises classic, highly cited articles in psychology: Cronbach and Meehl's ( 1955 ) and Loevinger's ( 1957 ) treatises on construct validity.

In this section we outline several issues that we view as central to an understanding of validity. The material in this section is drawn from our prior writing on the topic (Putka & Sackett, 2010 ).

Issue 1: Validity as Predictor-Criterion Relation versus Broader Conceptualizations

In the first half of the twentieth century, validity was commonly viewed solely in terms of the strength of predictor-criterion relations. Cureton's ( 1951 ) chapter on validity stated, reasonably, that validity addresses the question of “how well a test does the job it was employed to do” (p. 621). But the “job it was employed to do” was viewed as one of prediction, leading Cureton to state that “validity is . . . defined in terms of the correlation between the actual test scores and the ‘true’ criterion measures” (pp. 622–623).

But more questions were being asked of tests than whether they predicted a criterion of interest. These included questions about whether mastery of a domain could be inferred from a set of questions sampling that domain and about whether a test could be put forward as a measure of a specified psychological construct. A landmark event in the intellectual history of the concept of validity was the publication of the first edition of what is now known as the Standards for Educational and Psychological Testing (American Psychological Association, 1954 ), in which a committee headed by Lee Cronbach, with Paul Meehl as a key member, put forward the now-familiar notions of predictive, concurrent, content, and construct validity. Cronbach and Meehl ( 1955 ) elaborated their position on construct validity a year later in their seminal Psychological Bulletin paper. Since then, validity has been viewed more broadly than predictor-criterion correlations, with the differing validity labels viewed at first as types of validity, and more recently as different types of validity evidence or as evidence relevant to differing inferences to be drawn from test scores.

Issue 2: Validity of an Inference versus Validity of a Test

Arguably the single most essential idea regarding validity is that it refers to the degree to which evidence supports inferences one proposes to draw about the target of assessment [in the Industrial-Organizational (I-O) world, most commonly an individual; in other settings, a larger aggregate, such as a classroom or a school] from their scores on assessment devices. The generic question, “Is this a valid test?” is not a useful one; rather, the question should be “Can a specified inference about the target of assessment be validly drawn from scores on this device?” Several important notions follow from this position.

First, it thus follows that the inferences to be made must be clearly specified. It is often the case that multiple inferences are proposed. Consider a technical report stating “This test representatively samples the established training curriculum for this job. It measures four subdomains of job knowledge, each of which is predictive of subsequent on-the-job task performance.” Note that three claims are made here, dealing with sampling, dimensionality, and prediction, respectively. Each claim is linked to one or more inferences about an examinee (i.e., degree of curriculum mastery, differentiation across subdomains, relations with subsequent performance, and incremental prediction of performance across subdomains).

Second, when a multifaceted set of claims is made about inferences that can be drawn from the test, support for each claim is needed. Each inference may require a different type of evidence. The claim of representative content sampling may be supported by evidence of the form historically referred to as “content validity evidence,” namely, a systematic documentation of the relation between test content and curriculum content, typically involving the judgment of subject matter experts. The claim of multidimensionality may be supported by factor-analytic evidence, and evidence in support of this claim is one facet of what has historically been referred to as “construct validity evidence,” i.e., evidence regarding whether the test measures what it purports to measure. The claim of prediction of subsequent task performance may be supported by what has historically been referred to as “criterion-related validity evidence,” namely, evidence of an empirical relation between test scores and subsequent performance. Note that the above types of evidence are provided as examples; it is commonly the case that multiple strategies may be selected alone or in combination as the basis of support for a given inference. For example, obtaining empirical evidence of a test-criterion relation may not be feasible in a given setting due to sample size limitations, and the investigator may turn to the systematic collection of expert judgment as to the likelihood that performance on various test components is linked to subsequent job performance.

Third, some proposed inferences might receive support as evidence is gathered and evaluated, whereas others might not. In the current example, what might emerge is strong support for the claim of representative curriculum sampling and prediction of subsequent performance instead of evidence in support of a unidimensional rather than the hypothesized multidimensional structure. In such cases, one should revise the claims made for the test, in this case dropping the claim that inferences can be drawn about differential standing on subdomains of knowledge.

Issue 3: Types of Validity Evidence versus Types of Validity

Emerging from the 1954 edition of what is now the Standards for Educational and Psychological Testing was the notion of multiple types of validity. The “triumvirate” of criterion-related validity, content validity, and construct validity came to dominate writings about validity. At one level, this makes perfect sense: Each deals with different key inferences one may wish to draw about a test. First, in some settings, such as many educational applications, the key inference is one of content sampling. Using tests for purposes such as determining whether a student passes a course, progresses to the next grade, or merits a diploma relies heavily on the adequacy with which a test samples the specified curriculum. Second, in some settings, such as the study of personality, the key inference is one of appropriateness of construct labeling. There is a classic distinction (Loevinger, 1957 ) between two types of construct validity questions, namely, questions about the existence of a construct (e.g., can one define a construct labeled “integrity” and differentiate it from other constructs?) and questions about the adequacy of a given measure of a construct (e.g., can test X be viewed as a measure of integrity?). Third, in some settings, such as personnel selection, the key inference is one of prediction: Can scores from measures gathered prior to a selection decision be used to draw inferences about future job behavior?

Over the past several decades, there has been a move from viewing these as types of validity to viewing them as types of validity evidence. All lines of evidence—content sampling, dimensionality, convergence with other measures, investigations of the processes by which examinees respond to test stimuli, or relations with external criteria—deal with understanding the meaning of test scores and the inferences that can be drawn from them. As construct validity is the term historically applied to questions concerning the meaning of test scores, the position emerged that if all forms of validity evidence contributed to understanding the meaning of test scores, then all forms of validity evidence were really construct validity evidence. The 1999 edition of the Standards pushed this one step further: If all forms of evidence are construct validity evidence, then “validity” and “construct validity” are indistinguishable. Thus the Standards refer to “validity,” rather than “construct validity,” as the umbrella term. This seems useful, as “construct validity” carries the traditional connotations of referring to specific forms of validity evidence, namely, convergence with conceptually related measures and divergence from conceptually unrelated measures.

Thus, the current perspective reflected in the 1999 Standards is that validity refers to the evidentiary basis supporting the inferences that a user claims can be drawn from a test score. Many claims are multifaceted, and thus multiple lines of evidence may be needed to support the claims made for a test. A common misunderstanding of this perspective on validity is that the test user's burden has been increased, as the user now needs to provide each of the types of validity evidence. In fact, there is no requirement that all forms of validity evidence be provided; rather, the central notion is, as noted earlier, that evidence needs to be provided for the inferences one claims can be drawn from test scores. If the intended inferences make no claims about content sampling, for example, content-related evidence is not needed. If the claim is simply that scores on a measure can be used to forecast whether an individual will voluntarily leave the organization within a year of hire, the only inference that needs to be supported is the predictive one. One may rightly assert that scientific understanding is aided by obtaining other types of evidence than those drawn on to support the predictive inference [i.e., forms of evidence that shed light on the construct(s) underlying test scores], but we view such evidence gathering as desirable but not essential. One's obligation is simply to provide evidence in support of the inferences one wishes to draw.

Issue 4: Validity as an Inference about a Test Score versus Validity as a Strategy for Establishing Job Relatedness

In employment settings, the most crucial inference to be supported about any measure is whether the measure is job related. Labeling a measure as job related means “scores on this measure can be used to draw inferences about an individual's future job behavior”; we term this the “predictive inference.” In personnel selection settings, our task is to develop a body of evidence to support the predictive inference. The next section of this chapter outlines mechanisms for doing so.

Some potential confusion arises from the failure to differentiate between settings in which types of validity evidence are being used to draw inferences about the meaning of test scores rather than to draw a predictive inference. For example, content-related validity evidence refers to the adequacy with which the content of a given measure samples a specified content domain. Assume that one is attempting to develop a self-report measure of conscientiousness to reflect a particular theory that specifies that conscientiousness has four equally important subfacets: dependability, achievement striving, dutifulness, and orderliness. Assume that a group of expert judges is given the task of sorting the 40 test items into these four subfacets. A finding that 10 items were rated as reflecting each of the four facets would be evidence in support of the inference of adequate domain sampling, and contribute to an inference about score meaning. Note that this inference is independent of questions about the job relatedness of this measure. One could draw on multiple lines of evidence to further develop the case for this measure as an effective way to measure conscientiousness (e.g., convergence with other measures) without ever addressing the question of whether predictive inferences can be drawn from this measure for a given job. When one's interest is in the predictive hypothesis, various types of validity evidence can be drawn upon to support this evidence, as outlined below.

Issue 5: Validity Limited to Inferences about Individuals versus Including Broader Consequences of Test Score Use

In the past two decades, considerable attention has been paid to new views of validity that extend beyond the inferences that can be drawn about individuals to include a consideration of the consequences of test use. The key proponent of this position is Messick (1989), who noted that it is commonly asserted that the single most important attribute of a measure is that it is valid for its intended uses. He noted that at times test use has unintended negative consequences, as in the case in which a teacher abandons many key elements of a curriculum in order to focus all effort on preparing students to be tested in one subject. Even if inferences about student domain mastery in that subject can be drawn with high accuracy, Messick argued that the negative consequences (i.e., ignoring other subjects) may be so severe as to argue against the use of this test. If validity is the most important attribute of a test, then the only way for negative consequences to have the potential to outweigh validity evidence in a decision about the appropriateness of test use was for consequences of test use to be included as a facet of validity. Messick therefore argued for a consideration of both traditional aspects of validity (which he labeled “evidential”) and these new aspects of validity (which he labeled “consequential”). These ideas were generally well received in educational circles, and the term “consequential validity” came to be used. In this usage, a measure with unintended negative consequences lacks consequential validity. This perspective views such negative consequences as invalidating test use.

The 1999 Standards rejects this view. Although evidence of negative consequences may influence decisions concerning the use of predictors, such evidence will be related to inferences about validity only if the negative consequences can be directly traced to the measurement properties of the predictor. Using an example from the SIOP Principles for the Validation and Use of Personnel Selection Procedures (2003), consider an organization that (1) introduces an integrity test to screen applicants, (2) assumes that this selection procedure provides an adequate safeguard against employee theft, and (3) discontinues use of other theft-deterrent methods (e.g., video surveillance). In such an instance, employee theft might actually increase after the integrity test is introduced and other organizational procedures are eliminated. Thus, the intervention may have had an unanticipated negative consequence on the organization. These negative consequences do not threaten the validity of inferences that can be drawn from scores on the integrity test, as the consequences are not a function of the test itself.

Issue 6: The Predictive Inference versus the Evidence for It

As noted above, the key inference in personnel selection settings is a predictive one, namely, the inferences that scores on the test or other selection procedure can be used to predict the test takers’ subsequent job behavior. A common error is to equate the type of inference to be drawn with the type of evidence needed to support the inference. Put most bluntly, the error is to assert that “if the inference is predictive, then the needed evidence is criterion-related evidence of the predictive type.”

Scholars in the I-O area have clearly articulated that there are multiple routes to providing evidence in support of the predictive hypothesis. Figure 6.1 presents this position in visual form. Models of this sort are laid out in Binning and Barrett ( 1989 ) and in the 1999 Standards . The upper half of this figure shows a measured predictor and a measured criterion. As both are measured, the relation between them can be empirically established. The lower half of the figure shows an unmeasured predictor construct domain and an unmeasured criterion construct domain. Of interest is the set of linkages between the four components of this model.

The first and most central point is that the goal of validation research in the personnel selection context is to establish a linkage between the predictor measure (upper left) and the criterion construct domain (lower right). The criterion construct domain is the conceptual specification of the set of work behaviors that one wants to predict. This criterion construct domain may be quite formal and elaborate, as in the case of a job analytically specified set of critical job tasks, or it may be quite simple and intuitive, as in the case of an organization that wishes to minimize voluntary turnover within the first year of employment and thus specifies that this is the criterion domain of interest.

The second central point is that there are three possible mechanisms for linking an observed predictor score and a criterion construct domain. The first entails a sampling strategy. If the predictor measure is a direct sample of the criterion construct domain, then the predictive inference is established based on expert judgment (e.g., obtained via a job analysis process) (Linkage 5 in Figure 6.1 ). Having an applicant for a symphony orchestra position sight read unfamiliar music is a direct sample of this important job behavior. Having an applicant for a lifeguard position dive to the bottom of a pool to rescue a simulated drowning victim is a simulation, rather than a direct sample of the criterion construct domain. It does, however, rely on domain sampling logic and, like most work sample tests, aims at psychological fidelity in representing critical aspects of the construct domain.

The second mechanism for linking an observed predictor and a criterion construct domain is by establishing a pair of linkages, namely (1) the observed predictor–observed criterion link (Linkage 1 in Figure 6.1 ), and (2) the observed criterion–criterion construct domain link (Linkage 4 in Figure 6.1 ). The first of these can be established empirically, as in the case of local criterion-related evidence, or generalized or transported evidence. Critically, such evidence must be paired with evidence that the criterion measure (e.g., ratings of job performance) can be linked to the criterion construct domain (e.g., actual performance behaviors). Such evidence can be judgmental (e.g., comparing criterion measure content to critical elements of the criterion construct domain revealed through job analyses) and empirical (e.g., fitting confirmatory factor analysis models to assess whether the dimensionality of the observed criterion scores is consistent with the hypothesized dimensionality of the criterion construct domain). It commonly involves showing that the chosen criterion measures do reflect important elements of the criterion construct domain. Observed measures may fail this test, as in the case of a classroom instructor who grades solely on attendance when the criterion construct domain is specified in terms of knowledge acquisition, or in the case of a criterion measure for which variance is largely determined by features of the situation rather than by features under the control of the individuals.

Routes to Establishing the Predictive Inference.

The third mechanism also focuses on a pair of linkages: one between the observed predictor scores and the predictor construct domain (Linkage 2 in Figure 6.1 ), and the other between the predictor construct domain and the criterion construct domain (Linkage 4 in Figure 6.1 ). The first linkage involves obtaining data to support interpreting variance in predictor scores as reflecting variance in a specific predictor construct domain. This reflects one form of what has historically been referred to as construct validity evidence, namely, amassing theory and data to support assigning a specified construct label to test scores. If a test purports, for example, to measure achievement striving, one might offer a conceptual mapping of test content and one's specification of the domain of achievement striving, paired with evidence of empirical convergence with other similarly specified measures of the construct. However, showing that the measure does reflect the construct domain is supportive of the predictive inference only if the predictor construct domain can be linked to the criterion construct domain. Such evidence is commonly logical and judgmental, though it may draw upon a body of empirical research. It requires a clear articulation of the basis for asserting that individuals higher in the domain of achievement striving will have a higher standing on the criterion construct domain than individuals lower in achievement striving.

Thus, there are multiple routes to establishing the predictive inference. These are not mutually exclusive; one may provide more than one line of evidence in support of the predictive inference. It is also not the case that the type of measure dictates the type of evidentiary strategy chosen.

In conclusion, we have attempted to develop six major points about validity. These are that (1) we have moved far beyond early conceptualizations of validity as the correlation between test scores and criterion measures; (2) validity is not a characteristic of a test, but rather refers to inferences made from test scores; (3) we have moved from conceptualizing different types of validity to a perspective that there are different types of validity evidence, any of which might contribute to an understanding of the meaning of test scores; (4) the key inference to be supported in employment settings is the predictive inference, namely, that inferences about future job behavior can be drawn from test scores; (5) although evidence about unintended negative consequences of test use (e.g., negative applicant reactions to the test) may affect a policy decision as to whether to use the test, such evidence is not a threat to the predictive inference and does not affect judgments about the validity of the test; and (6) there are multiple routes to gathering evidence to support the predictive inferences. Our belief is that a clear understanding of these foundational issues in validity is essential for effective research and practice in the selection arena.

Strategies for Establishing the Predictive Inference

In this section we examine three major categories of strategies for establishing the predictive inference. These have been introduced above and reflect the three ways of linking observed predictor scores with the criterion construct domain that are reflected in Figure 6.1 . The first category pairs Linkage 1 (evidence of an empirical linkage between observed predictor scores and observed criterion scores) with Linkage 4 (evidence that the measured criterion reflects the intended criterion construct domain) and reflects the use of some form of criterion-related validity evidence. The second involves Linkage 5 (evidence that the predictor contains content from the criterion construct domain) and involves making use of content-oriented validity evidence to establish that the observed predictor measure represents a sample (or simulation) of behavior from the criterion construct domain. The third involves pairing Linkage 2 (evidence that the observed predictor measure reflects the predictor construct domain of interest) and Linkage 4 (evidence of a relation between the predictor and criterion construct domains).

Using Criterion-Related Validity Evidence to Support the Predictive Inference

The relation between an observed predictor and the criterion construct domain can be established by demonstrating evidence for a pair of linkages in Figure 6.1 : the observed predictor–observed criterion linkage (1), and the observed criterion–criterion construct domain linkage (4). Myriad strategies for garnering evidence of these linkages exist under the rubric of establishing criterion-related validity evidence (Binning & Barrett, 1989 ). Conceptually, these strategies can be grouped into four general categories ranging from concrete to abstract in terms of the evidence they provide (Landy, 2007 ). At the most concrete end of the continuum are local criterion-related validation studies involving the specific predictor measure(s) one plans to use locally, as well as criteria for the local job (Campbell & Knapp, 2001 ; Sussmann & Robertson, 1986 ; Van Iddekinge & Ployhart, 2008 ). At the most abstract end are meta-analytic validity generalization (VG) studies that may not include the specific predictor measure(s) one plans to use locally, nor criteria for the local job (McDaniel, 2007 ; Schmidt & Hunter, 1977 ; Pearlman, Schmidt, & Hunter, 1980 ). In between these extremes are validity transportability studies (Gibson & Caplinger, 2007 ; Hogan, Davies, & Hogan, 2007 ) and synthetic validation studies (Hoffman, Rashkovsky, & D'Egidio, 2007 ; Johnson, 2007 ; Scherbaum, 2005 ; Steel, Huffcutt, & Kammeyer-Mueller, 2006 ) involving the specific predictor measure(s) one plans to use locally but not necessarily criteria for the local job.

Before discussing these strategies in more detail, we provide two points of clarification. First, the categories above do not define all strategies that can be used to establish criterion-related evidence; there are many creative examples of hybrid strategies. For example, Hogan et al. ( 2007 ) combined meta-analytic VG and validity transportability strategies to provide criterion-related evidence for a personality measure. Another example is work that has examined Bayesian methods for combining evidence from a local criterion-related validity study with preexisting meta-analytic evidence (e.g., Brannick, 2001 ;Newman, Jacobs, & Bartram, 2007 ; Schmidt & Raju, 2007 ). One other example is work by Hoffman, Holden, and Gale ( 2000 ) in which evidence based on all four strategies noted above was used to support inferences regarding the validity of cognitive ability tests for a local set of jobs. Thus, although we discuss each strategy separately, we encourage practitioners to build on evidence from multiple strategies to the extent that their local situations permit. Blending evidence from multiple strategies can help offset weaknesses inherent in any single strategy and therefore strengthen one's position to defend the predictive inference.

A second clarification is that although each strategy can be used to establish criterion-related evidence, they differ in terms of the degree to which they directly address Linkages 1 and 4. This is due in part to differences in their level of abstraction noted above. For example, a well-designed local criterion-related validation study with a sufficient sample size can do an excellent job of evaluating empirical evidence of Linkage 1 (i.e., observed predictor–observed criterion link). Nevertheless, Linkage 4 (i.e., observed criterion–criterion construct domain link) is also critical for establishing the predictive inference and has a tendency to be given less emphasis in the context of local studies (Binning & Barrett, 1989 ). As another example, one may argue that the typical meta-analytic VG study does not address Linkage 1 or 4 directly but only in the abstract. Specifically, meta-analytic studies often focus on links between predictor constructs or methods (e.g., conscientiousness, employment interviews) and general criterion constructs (e.g., task performance, contextual performance). We suggest that a meta-analysis that is conducted at the construct level, as in the above example, is better viewed as providing evidence for Linkage 3 (i.e., a relation between predictor and criterion construct domains), a point we develop in the subsequent section on Linkage 3. In addition, the measures used in a given meta-analysis may or may not capture one's local predictor measure and local criterion domain . For example, although a meta-analysis may focus on conscientiousness-task performance relations, not all measures of conscientiousness function similarly (Roberts, Chernyshenko, Stark, & Goldberg, 2005 ), and the types of tasks that serve to define the domain of “task performance” on the local job can be quite different from tasks on other jobs and have implications for prediction (Barrett, 2008 ; Barrett, Miguel, Hurd, Lueke, & Tan, 2003 ; Tett & Burnett, 2003 ). Thus, leveraging results of a meta-analysis of predictor construct domain-criterion construct domain relations to establish the job relatedness of an observed predictor measure in one's local situation may present many challenges (McDaniel, 2007 ; Oswald & McCloy, 2003 ; Sackett, 2003 ).

Local Criterion-Related Validation Studies

Local criterion-related validation studies involve gathering data on a predictor measure from a local sample of applicants or incumbents, and examining the empirical relation (e.g., Pearson correlation or other indicator of effect size) between scores on the predictor measure and criterion data gathered for those individuals (e.g., ratings of their job performance, work sample tests). The strength of this relation can provide a direct assessment of evidence for Linkage 1 (i.e., the observed predictor–observed criterion link).

Depending on the nature of the validation sample and study design, adjustments often need to be made to the statistic summarizing the observed predictor-criterion relation (e.g., Pearson r for continuously scaled criteria, Cohen's d for binary criteria, such as turnover) to estimate the observed relation for the applicant population of interest (Bobko, Roth, & Bobko, 2001 ; Sackett & Yang, 2000 ). In other words, the goal of a local criterion-related validation study is not to calculate the observed relation in the local sample, but to estimate how strongly we would expect the observed predictor and criterion to be related if the predictor were administered to the entire applicant population and all of those applicants were subsequently observed on the job.

For example, if one's local validation sample comprised job incumbents, and those incumbents were originally selected into the organization on a test that is correlated with the predictor measure, then the observed relation between the predictor and the criterion would be attenuated due to indirect range restriction stemming from selection on a test related to the predictor of interest. In contrast, if incumbents in the local study happened to be selected on the predictor of interest, then the observed predictor-criterion relation would be attenuated due to direct range restriction stemming from selection on the predictor itself. These examples represent two ways that range restriction can manifest itself and influence observed predictor-criterion relations. Sackett and Yang ( 2000 ) provide a thorough overview of ways in which range restriction manifests itself in the context of local criterion-related validation studies, as well as how such effects can be accounted for when estimating criterion-related validity coefficients (see also Van Iddekinge & Ployhart, 2008 for an overview of recent developments).

Although establishing evidence of observed predictor-observed criterion relations (Linkage 1) is a focus of local criterion-related validation studies, one must also demonstrate that the criterion measure adequately captures the latent criterion domain (Linkage 4), because doing so allows one to establish evidence for the primary predictive inference of interest (i.e., the observed predictor–criterion construct domain relation). This is a matter of (1) establishing that the criterion measure is reliable and neither systemically contaminated with irrelevant variance nor deficient in terms of its coverage of the criterion domain, and (2) correcting the observed predictor-criterion relation for unreliability in the criterion measure when appropriate to do so.

Given the critical role of the observed criterion in a local validation study, the quality of the criterion measure should be addressed at the outset when determining the feasibility of the study, and should remain of concern during subsequent criterion measure development. Although the apparent simplicity of falling back on post hoc corrections to account for unreliability in one's criteria is appealing (Schmidt & Hunter, 1996 ), practitioners need to be cognizant of the debate regarding the appropriateness of such corrections (e.g., Murphy & DeShon, 2000 ; Schmidt, Viswesvaran, & Ones, 2000 , see also Mackenzie, Podsakoff, & Jarvis, 2005 for an alternative perspective on this issue). Furthermore, such corrections also make testing the statistical significance of validation findings less straightforward (e.g., Bobko, 1983 ; Hakstian, Schroeder, & Rogers, 1988 ; Raju & Brand, 2003 ). Lastly, although unreliability in criterion measures can potentially be corrected for, systematic contamination and deficiency generally cannot be, and practitioners should limit these sources of error through careful criterion measure development (Oswald & McCloy, 2003 ).

Unfortunately, developing criteria for use in local validation studies is easier said than done. Indeed, the lack of quality criteria is one of the greatest impediments to executing a local criterion-related validation study, as measuring job performance has been one of the most vexing problems our field has faced over the past century (Austin & Villanova, 1992 ; Bennett, Lance, & Woehr, 2006 ; Jenkins, 1946 ; Murphy, 2008 ). 1 Although the quality of criteria is often mentioned as critical to validation efforts, there appear to be few clear standards and rules-of-thumb that practitioners can use when evaluating whether their criteria provide adequate enough measurement of the local criterion construct domain to conclude that there is sufficient evidence of Linkage 4.

For example, if a set of performance rating scales addresses all of a job's critical task clusters, but for 3 of 10 task clusters the interrater reliability of ratings is in the 0.20 range, has the practitioner adequately measured the criterion domain of task performance for that job? Similarly, if a practitioner can legitimately measure only 6 of 10 critical task clusters for a job (because some of them do not lend themselves to assessment via rating), has the practitioner sufficiently covered the criterion domain for that job? If a practitioner ignores task clusters altogether but simply asks each incumbent's supervisor to answer five questions regarding the incumbent's overall task performance and achieves a coefficient alpha reliability estimate of 0.95 for that measure, do the resulting scores provide adequate coverage of the criterion domain for the job in question? Unfortunately, for any given question above, our guess is that experts’ opinions would vary due in part to lack of clear standards for linking the psychometric properties of criterion measures to judgments regarding the veracity of Linkage 4. Complicating matters further, depending on the circumstances (e.g., litigiousness of the local environment), experts might come to different conclusions even if presented with the same evidence. Just as concepts of utility have assisted practitioners when interpreting and evaluating the magnitude of validity coefficients in practical terms (i.e., the psychometric basis of evidence of Linkage 1), it would be useful to have metrics that help evaluate the magnitude of psychometric indicators of evidence for Linkage 4 in practical terms.

Another common challenge of executing a local validation study is obtaining a sufficiently large sample size. For example, the number of job incumbents or applicants might not be large enough to support a study that would have sufficient power to detect predictor-criterion relations of what a given decision maker views as a minimally acceptable size (e.g., one decision maker might wish to detect a Pearson correlation of 0.10, whereas another's threshold is 0.20), or the local organization might not have the resources to support such an effort. To help determine the feasibility of executing a local criterion-related validation study, formulas exist for estimating the minimal samples sizes required to detect various levels of effect size with a given level of power (e.g., Cohen, 1988 ; Murphy, Myors, & Wolach, 2009 ). Nevertheless, practitioners relying on such formulas should be cognizant that they are based on sampling theory underlying uncorrected statistics (e.g., raw Pearson correlations). The sampling variance of corrected validity coefficients will be larger than the sampling variance of uncorrected coefficients (e.g., Aguinis & Whitehead, 1997 ; Bobko, 1983 ; Chan & Chan, 2004 , Raju & Brand, 2003 ). Thus, sample sizes required to detect statistically significant corrected predictor-criterion relations of a minimally acceptable size will be larger than sample size estimates provided by traditional power formulas.

For example, assume criterion-related validity estimates for two predictors are based on the same sample size: an uncorrected estimate of 0.10 for predictor A and a corrected estimate of 0.10 for predictor B. The power for detecting a significant correlation for predictor B will be lower than the power for detecting a significant correlation for predictor A, the result of a higher sampling variance associated with predictor B's corrected correlation. To achieve equal levels of power, one would need to increase the sample size for predictor B to offset the additional sampling error variance associated with its corrected correlation. When planning local criterion-related validity studies, this observation can be important because one is typically interested in the power to detect a corrected relation of a given magnitude—not an uncorrected relation of a given magnitude.

In addition to issues of criterion quality and sample size, practitioners carrying out a local criterion-related validation study must be cognizant of several other issues that can affect the quality of their study and generalizability of its results to the applicant population of interest. Key considerations include (1) representativeness and characteristics of the validation sample (e.g.,. demographic composition, applicant versus incumbents, representation of specialties within the target job); (2) design of the validation study (e.g., predictive versus concurrent, timing of predictor and criterion administration; Sussmann & Robertson, 1986 ); (3) theoretical relevance of the predictors(s) to the criterion domain; (4) psychometric characteristics of the predictor(s); (5) predictor combination, sequencing, and staging decisions (De Corte, Lievens, & Sackett, 2006 ; Finch, Edwards, & Wallace, 2009 ; Sackett & Roth, 1996 ); (6) criterion combination and weighting issues (Murphy & Shiarella, 1997 ); and (7) statistical correction, estimation, and cross-validation issues (Society for Industrial and Organizational Psychology, 2003 ).

Though SIOP's 2003 Principles offer general guidance on key considerations for executing criterion-related validation studies, they are not very specific and will necessarily become dated as new methods are developed. Indeed, many of the areas on which the SIOP Principles offer guidance reflect active research areas. In response to a lack of specific guidance in the Principles and other general professional guidelines (e.g., American Educational Research Association, American Psychological Association, and National Council on Measurement in Education 1999 ; Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor, & Department of Justice, 1978 ), Van Iddekinge and Ployhart ( 2008 ) recently summarized developments between 1997 and 2007 regarding several key issues pertinent to designing and conducting local criterion-related validation studies. Their work covers topics such as validity coefficient correction procedures, evaluation of multiple predictors, influences of validation sample characteristics, and myriad issues surrounding the quality and characteristics of validation criteria. A review of Van Iddekinge and Ployhart ( 2008 ) serves as an important reminder that the scientific literature has yet to provide clear guidance on several areas of selection practice. Thus, practitioners undertaking local criterion-related validity studies should not simply rely on established guidelines, but should also consider recent developments in the scientific literature to ensure that they base their practice on the latest available knowledge.

Because criterion issues, sample size, and resource constraints can limit the feasibility of conducting local criterion-related validation studies, practitioners must often consider alternative strategies for establishing criterion-related validity evidence (McPhail, 2007 ). Executing a criterion-related validity study requires consideration of numerous factors that can influence the quality of results. However, if the resources are available, relative to the other criterion-related strategies we will discuss below, local studies can offer more direct control over factors that influence results (e.g., the quality and relevance of predictor and criterion measures used, study design features) and allow one to locally estimate and correct for statistical artifacts that may affect their results (e.g., criterion unreliability, range restriction)—as opposed to making assumptions about the degree to which such artifacts manifest themselves based on past research. Having more (or simply clearer) control over such factors is important in that it allows researchers to address a priori (through explicit development and design decisions) potential weak spots in the arguments they make for the veracity of Linkages 1 and 4. In contrast, when adopting validity generalization and transportability strategies, the local practitioner is constrained more by the quality of the decisions made by the authors of the meta-analysis in question and the characteristics of the primary studies underlying it, or in the case of a transportability study, the characteristics of the original validation study.

Validity Generalization

Since Schmidt and Hunter's seminal VG work in 1977, the use of meta-analytic VG strategies for establishing criterion-related validity evidence has gained prominence. Indeed, the 2003 SIOP Principles formally recognize VG as a strategy for establishing criterion-related evidence. Schmidt and colleagues observed that nontrivial portions of the variability in criterion-related validity estimates across local studies could be attributed to sampling error and artifacts associated with the execution or design of the study (e.g., differences in the reliability of criterion measures, differences in the degree of range restriction across studies). Indeed, Schmidt and Hunter, and other early proponents of VG, found that once variability due to sampling error and other statistical artifacts were accounted for, the variability in criterion-related validity of cognitively loaded tests across studies was greatly reduced (e.g., Pearlman, Schmidt, & Hunter, 1980 ).

Statistically speaking, evidence that validity estimates generalize based on results of a meta-analysis is established by comparing a cutoff point on the lower end of the distribution of operational validity estimates (i.e., a distribution of validity coefficients that has the effects of sampling error and other artifacts, such as criterion reliability and range restriction, removed from its variance) to zero. This cutoff point has historically corresponded to the lower bound of a meta-analytic 80% credibility interval (i.e., the level of validity associated with the tenth percentile of the operational validity distribution; Hunter & Schmidt, 1990 ; Whitener, 1990 ). 2 If one finds that the lower bound of the credibility interval exceeds zero, then one can conclude that validities for the predictor or measurement method generalize across the conditions reflected in the studies included in the meta-analysis (Murphy, 2003 ). It is critical to note that this does not imply that (1) validities will generalize across all conditions (including the situation that may be of interest to a local practitioner), (2) moderators of the magnitude will not be present, as the credibility interval can still be quite wide, or (3) the validity will be of practically meaningful size for one's local situation (e.g., a validity of 0.01 would not be very meaningful if number of hires is small and testing costs are high). It does suggest that one can be reasonably confident that validities will be greater than zero under conditions reflected in the studies included in the meta-analysis.

Since Schmidt and Hunter ( 1977 ), evidence that validities generalize for various predictor constructs (e.g., conscientiousness, integrity, cognitive ability) and predictor methods (e.g., structured interviews, biodata) under certain conditions has been presented (see Schmitt & Fandre, 2008 for a review). Nevertheless, simply because one concludes that validities generalize based on results of a given meta-analysis, it should not be taken as evidence that they generalize to one's local selection situation . Making the inferential leap between meta-analytic results for a given predictor and the specific use of that predictor in one's local setting depends on several factors beyond the statistical threshold noted above and traces back to characteristics of the meta-analytic database and the execution of the meta-analysis itself (e.g., see McDaniel, 2007 ; Oswald & McCloy, 2003 ; Sackett, 2003 ; SIOP, 2003 for a review of key considerations).

There are two sets of factors to be considered when attempting to generalize meta-analytic evidence to one's local situation. One set deals with issues of comparability between the local situation and the studies included in the meta-analytic database, such as the (1) comparability of jobs in the meta-analysis to the local job, (2) comparability of predictor measures in the meta-analysis to the local predictor measure, and (3) comparability of criterion measures used in the meta-analysis and the criterion domain for the local job. The other group deals with statistical considerations such as (1) the appropriateness of statistical corrections used in executing the meta-analysis, (2) the power of the meta-analysis to detect moderators of predictor-criterion relations, and (3) bias in the reporting of findings underlying the meta-analysis (e.g., publication bias). Any time practitioners consider using meta-analytic findings to support inferences regarding predictor-criterion correlations in their local setting, they should give serious consideration to each of the issues above. Failing to adequately address them can substantially weaken one's arguments for Linkages 1 and 4 discussed earlier. In fact, we will argue that in many cases it is conceptually more appropriate to view meta-analytic findings as evidence for Linkage 3.

The first group of considerations noted above addresses the importance of considering whether one's local situation is adequately reflected in the meta-analytic database. Typically, making this determination involves identifying clear boundary conditions for generalizations warranted by the given meta-analysis (e.g., boundary conditions dealing with content, structure, and scoring of predictors and criteria, and study setting and respondent type), and assessing whether conditions defining one's local situation fall within those bounds. Unfortunately, sufficient details that allow for a clear definition of boundary conditions pertinent to one's local situation may often be unavailable from published meta-analyses (e.g., see Schmitt & Sinha, 2010 for a critique of existing selection-oriented meta-analyses). Although the coding of studies underlying a published meta-analysis might be sufficient to establish boundary conditions from a general scientific perspective, it might not be of sufficient detail to inform local practice. As such, local practitioners might often need to consider establishing these boundary conditions post hoc to determine whether they can make defensible inferences regarding predictor-criterion relations for their local situation. For example, this process might involve requesting a copy of the meta-analytic database from the original authors, recoding studies as needed to clarify boundary conditions most pertinent to one's local situation, and potentially rerunning analyses in a way that facilitates an evaluation of VG evidence (e.g., excluding studies that are highly irrelevant to one's local situation, conducting a more “local” meta-analysis per Oswald & McCloy, 2003 ).

Although the issues raised above may be most salient when attempting to generalize evidence from meta-analyses published in the scientific literature, even those meta-analytic studies in which the boundary conditions are more in line with one's local situation require careful scrutiny by practitioners. For example, consider a meta-analysis conducted by a test vendor regarding one of its off-the-shelf tests that it sells for purposes of selecting entry-level hires in a given job family. In this case, suppose the vendor maintains a database of local validation studies specific to the test and job family in question. A local practitioner wants to make a VG argument for use of the test in his or her local situation based on the vendor's meta-analytic results, and the job in question is the modal job represented in the vendor's meta-analysis. In such cases, boundary conditions may be less of an issue, but statistical issues may limit the strength of evidence one has for Linkages 1 and 4. For example, McDaniel, Rothstein, and Whetzel ( 2006 ) have advocated running trim-and-fill analyses to analyze the potential for publication biases in light of their findings that criterion-related validity information reported by some test publishers may overstate validity evidence for their assessments. For example, some validation efforts may be suspended by the client organization if preliminary findings do not look promising, such that a technical report is never written. If bias is revealed through such analyses, it represents a weakness in the practitioner's claim for evidence of Linkage1, which could be exploited by an opponent should use of the predictor be legally challenged.

Transportability Studies

If one does not have the resources to conduct a local criterion-related validity study and the body of meta-analytic evidence does not appear to be sufficient, another possibility is to perform a transportability study. The idea behind a transportability study is to use criterion-related validity evidence established for a given assessment in another setting (e.g., another job, another organization), and “transport” that evidence to the local situation by establishing their comparability. Most often, establishing comparability means establishing evidence that critical tasks and/or knowledges, skills, abilities, and other characteristics (KSAOs) are similar across jobs. If one is practicing in the United States, the Uniform Guidelines on Employee Selection Procedures clearly express a preference for establishing similarity in terms of job tasks or work behaviors (Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor, & Department of Justice, 1978 ).

There are multiple methods for establishing comparability of tasks and KSAOs across jobs, or more generally job similarity (e.g., Hoffman et al., 2000 ; Johnson & Jolly, 2000 ; see also Gibson & Caplinger, 2007 for a review). Ideally, data on a common job analysis questionnaire for both jobs in question (i.e., the one for which criterion-related validity evidence exists and the local job of interest) would allow a direct comparison of similarity.

In addition to addressing similarity of the jobs in question, another key to making a transportability argument is the quality of the study in which the original validity estimate was calculated. For example, even if comparison of job analysis data revealed that the jobs were very similar, if the original local criterion-related validity study was flawed or paid inadequate attention to the numerous factors that can affect the results of local criterion studies (reviewed earlier), then using the original validity evidence to defend Linkages 1 and 4 for the local job would be suspect.

As a more concrete example, imagine a situation in which statistically significant criterion-related validity coefficients were found in the original study. Furthermore, assume that the comparison of job analyses revealed that the current job of interest and the job from the original study were highly similar in terms of critical job tasks, but that the observed criterion measure in the original study was inadequate. Thus, despite having statistically significant validity estimates and job analysis data that are comparable, the inference of primary interest (i.e., the relation between the observed predictor and criterion construct domain) is weakened by the fact that evidence of Linkage 4 is not adequately established.

A key challenge in executing a transportability study is access to information regarding the original study. When conducting a local criterion-related validity study one has complete access to all materials and methodology surrounding that study. If someone is transporting criterion-related validity evidence based on a study that was conducted by another organization or test publisher, then the local practitioner depends on the quality of the original study's documentation as well as any additional details regarding the study that its owners are willing to share. This is akin to the situation noted above when one is attempting to generalize evidence based on results of a meta-analysis in which they depend on the level of documentation available for the meta-analytic database to establish boundary conditions. Without sufficient details about the original study, the local practitioner may be unable to evaluate the original work sufficiently, thus weakening his or her arguments for the veracity of Linkages 1 and 4 for the local job. As such, transporting validity evidence using this strategy may prove even more challenging than conducting a local criterion-related validation study one still carries the burden of considering all the issues affecting the quality of a local study as well as the burden of justifying the comparability of the job(s) in question.

Synthetic Validation

Another strategy for establishing evidence for Linkages 1 and 4 involves synthesizing validity evidence. As with transportability studies, synthesizing validity evidence can take on multiple forms (see Johnson, 2007 ; Scherbaum, 2005 ; Steel et al., 2006 for reviews), but there is a general recognition that regardless of strategy, three elements are required (Guion, 1965 ): (1) identifying performance components for a set of jobs, (2) establishing relations between one's predictor measures and those components, and (3) establishing relations between one's predictor measures and a composite of the performance components specific to a given job. Practically speaking, estimating criterion-related validity evidence for a predictor (or predictor battery) and evaluating evidence of Linkage 1 require estimating four types of parameters: (1) correlations among the predictors, (2) correlations among the job components, (3) correlations between predictors and job components, and (4) some indicator of which components are required for a given job and their weighting for that job. These estimates may be based on data collected by the practitioner (e.g., via a local study), judgments provided by subject matter experts, or a combination of both (e.g., Peterson, Wise, Arabian, & Hoffman, 2001 ).

Synthetic validity approaches have the potential to be particularly useful when developing selection systems for multiple jobs within an organization—a situation that lends itself to capitalize on economies of scope from the perspective of job analyses and selection measure development and validation. For example, because the initial data collection focus of synthetic validation efforts is not on estimating relations between predictor measures and overall job performance on any given job, but rather on estimating relations between predictor measures and individual performance components that multiple jobs share in common, the sample size requirements for any single job are reduced relative to a local criterion-related validity strategy in which a single job is examined. With the correlations among the job performance components and predictors, it is a relatively straightforward exercise to estimate the validity of a predictor (or battery comprising multiple predictors) for predicting a composite of job performance components (i.e., an overall job performance composite) for any given job (see Hamilton & Dickinson, 1987 for a review of methods).

Although a synthetic validation approach offers an important option for practitioners faced with establishing criterion-related evidence for multiple jobs, the approach rests on several assumptions. The first assumption (relevant to Linkage 1) is that the relation(s) between the predictor(s) and a given job performance component is invariant across jobs. Of course, one would expect observed correlations between predictors and job performance components to vary as a function of various artifacts (e.g., sampling error, differences in range restriction), but beyond such artifacts, such correlations would be expected to be invariant. It is this assumption that allows one to conclude that the predictor-job performance component relation based on incumbents from multiple jobs provides an accurate estimate of the predictor-job performance component relation among incumbents for any job that involves the given performance component. In this way, synthetic validation approaches can help practitioners reduce the data collection demands on any single job that typically hamper carrying out a local criterion-related validity study.

A second assumption (relevant to Linkage 4) is that the set of performance components reflected in the synthetic validity database adequately reflects the performance requirements of a given job of interest. Consider a synthetic validity study in which the bulk of jobs in question share many performance components, but several jobs have multiple, unique (yet critical) performance components. For example, such jobs might include those that have specialized knowledge, skill, and ability requirements that are particularly salient to effective job performance. The fact that such jobs have unique components that are not reflected in the synthetic validity database, yet are critical to effective performance, weakens the evidence of Linkage 4 for the jobs in question. Thus, although synthetic validation approaches can help reduce the sampling burden imposed on any single job, they do not reduce the burden on practitioners to demonstrate evidence of Linkage 4, which can prove difficult if the performance components underlying one's synthetic validity database do not adequately capture the performance domain of interest.

At the outset of this section, we identified four general strategies for establishing criterion-related validity evidence. All of these strategies can be viewed as directly or indirectly addressing Linkages 1 and 4 in Figure 6.1 as they pertain to one's local situation. A common theme running through our review of strategies was the importance of Linkage 4 (i.e., the observed criterion–criterion construct domain linkage), which often tends to receive far less attention when discussing criterion-related strategies relative to Linkage 1 (see Campbell & Knapp, 2001 , for a discussion of the Army's Project A, perhaps the best example of a study in which this linkage has received appropriate attention). Using Figure 6.1 as an example, we, as a field, arguably need to be more explicit about the box representing the criterion construct domain on the lower right. From a scientific perspective, this is often conceived as representing a latent, global job performance construct (e.g., task or contextual or adaptive performance), but what is of interest to the local practitioners is the latent, local job performance construct (i.e., what defines task performance on job X ). For example, if task performance on job X is defined primarily by artistic, creative activities and task performance on job Y is defined by physical, mechanical activities, then the local “task performance” domain is substantively very different across jobs. Clear guidelines for evaluating whether the local criterion domain is adequately represented in the observed criterion measures used in local studies, or in the body of studies contributing to VG-based arguments, continue to be a need for our field. Such guidelines would be of significant benefit when evaluating evidence for Linkage 4 for all the criterion-related strategies reviewed here.

Using Content-Oriented Validity Evidence to Support the Predictive Inference

The reliance on content-oriented validity evidence (historically referred to as “content validity;” we use both interchangeably here) ranks among the most contentious topics in I-O psychology, bringing to mind the idiom that “one person's trash is another person's treasure.” For some, it seems that the notion of content validity is a misnomer at best (Murphy, 2009 ) and has even served as a source of discontent for many years (Guion, 1977 , 2009 ). For others, it remains a useful concept for a particular type of validity evidence, however named (e.g., Tonowski, 2009 ). How can we reconcile these various viewpoints? This section presents some suggestions as to why the literature on content validity seems so out of line with what many practitioners experience in the everyday work setting and provides some notions to help one grapple with this rather awkward concept.

Most psychologists realize that we have two definitions of validity available to us. The first definition is the “proper” or “more technically correct” definition for validation provided by the Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education, 1999 ): “Validity is a unitary concept. It is the degree to which all the accumulated evidence supports the intended interpretation of test scores for the proposed purpose” (p. 11). Hence, at the core of validation lies the use and interpretation of test scores, a notion echoed in the literature by luminaries of the field (e.g., Cronbach, 1971 ; Messick, 1981 ). The second definition can be described as the “lay definition”—the one we tend to use when trying to discuss validity and validation with clients or friends: “Validity is the degree to which a test measures that which it is meant to measure.”

On most occasions, the two definitions do not create problems for us, but there are times when it is difficult to consider them simultaneously among the various types of validity evidence available. The second definition seems to precede the first in the sense that one must have a test before one can use it, and a test is constructed (whether poorly or well) to assess some domain, whether it be a concrete, finite content domain (“ability to compute sums of two two-digit numbers”) or a more general and abstract construct (“garden-variety anxiety”). In one sense, this is one of the points Guion ( 1977 ) made when he avowed that content validity “refers to psychological measurement” and that “content sampling may be considered a form of operationalism in defining constructs” (p. 1). That is, the process of identifying a domain of stimuli (items) and behaviors (responses to those stimuli) and identifying an appropriate means of measuring those stimuli and behaviors lies at the heart of any measurement development endeavor—it is the process of generating operational definitions for our constructs. 3 The second definition also seems to be the residence for content validity, asking whether we have adequately defined a content domain and sampled from it appropriately: “The purpose of a content validation study is to assess whether the items adequately represent a performance domain or construct of specific interest” (Crocker & Algina, 1986 , p. 218). This definition can be seen to reference Linkages 5 (sampling of observed predictor content from a criterion construct domain), 2 (representativeness of predictor content from an intended predictor construct domain), and 4 (representativeness of criterion content from an intended criterion construct domain). Use of the test has not yet appeared on the horizon.

Problems with the two definitions can arise, however, when we actually use our tests. Drawing proper inferences from a set of test scores is the sine qua non of test use, but the use of test scores and the appropriate measurement of constructs are two separate concepts that do not always coincide. We can think of situations in which a test has been deemed to adequately assess a construct but might be used improperly (e.g., using a test to obtain scores for members of a population for which it was not intended or normed—say, using the SAT to identify the most academically talented and gifted seventh graders), thus invalidating inferences we may wish to draw. Similarly, we can think of situations in which a measure fails to measure the intended construct but exhibits a rather healthy correlation with a criterion anyway (e.g., when faking on a personality instrument leads to inaccurate assessments of examinees’ standing on the traits of interest but yields scores that predict job performance because the job requires high levels of impression management). Finally, we should mention the situation that arises with empirically keyed measures in which we may have little knowledge of exactly what we are measuring with a particular item or set of items but use them anyway because they show useful predictive validity for one or more criteria of interest.

Does Content-Oriented Validity Evidence Tell Us Anything About Validity?

It is this notion—that tests comprising little job-related content (and thus are lacking in terms of Linkage 5) can possess substantial levels of validity (in the sense of empirical relations with a criterion—Linkages 2 and 3)—that Murphy ( 2009 ) used to argue that levels of content matching provide little indication about the validity of a given test, stating that sometimes tests with very little job-related content can exhibit correlations with a criterion of interest that match or even exceed the correlations obtained by tests that do show high levels of matched content (the upper left quadrant of his Figure 6.1 , p. 459). Murphy also stated that there might even be instances in which a test that does evidence high degrees of overlap with job content fails to predict performance well (as might occur when judges fail to use rating scales as intended when scoring assessment center exercises; this is the lower right quadrant of his Figure 6.1 , p. 459). All of this can lead to rather confusing, seemingly contradictory statements about content-oriented validation evidence and content-oriented test construction (Thornton, 2009 ). On the one hand, content validity has been viewed as either a form of construct validity or else nothing more than operational definitions of a construct (Guion, 1977 ) and has been criticized for providing little to no information about the validity of the measure in the sense of Linkages 2 and 3 (Murphy, 2009 ). On the other hand, developing tests that have strong overlap with a content domain (establishing a strong Linkage 5) is widely acclaimed. Surely, a test that was developed without reference to a specified domain of content would be viewed as invalid for most (if not all) purposes (i.e., a failure to establish either Linkage 3 or 5), and construction of tests with content that matches the content of the jobs for which they are used as predictors of performance can result in “substantial” validity (Murphy, 2009 , p. 455). Can we really ignore the content of a measure when marshalling validity evidence for it?

The Standards (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education, 1999 ) identify the various types of data used to establish evidence for the validity of the use of a test. To illustrate, Thornton ( 2009 ) cited the following material from the Standards : “A sound validity argument integrates various strands of evidence into a coherent account of the degree to which existing evidence and theory support the intended interpretation of test scores for specific uses . . . . Ultimately, the validity of an intended interpretation of test scores relies on all the available evidence relevant to the technical quality of a testing system” (p. 17). Test content appears among the types of evidence listed in the Standards , along with internal structure, relations with other variables including tests measuring similar constructs and criteria, and response processes. Thus, establishing appropriate content definition and sampling from these serve as a worthwhile validation activity.

Of course, sampling from the content domain must be done carefully and intelligently. Failure to adequately sample from a specified content or construct domain could well lead to instances that would fill the lower right quadrant of Murphy's ( 2009 ) Figure 6.1 —measures that show a strong match to job content but fail to predict job performance well. If content sampling on a test is too limited, the resulting measure could indeed be job related but rather strikingly deficient in domain coverage, thus leading to a mismatch between the content of the predictor measure (narrow, constrained content sample) and the content of the performance criterion (wider, less constrained content sample). The content in the criterion not reflected by the predictor content would serve as error variance in any prediction system—variance in the criterion that is unrelated to variance on the predictor. In terms of our Figure 6.1 , this would indicate a problem with Linkage 4, which in turn may limit one's ability to find evidence of Linkage 1. In such a situation, evidence of Linkage 5 could clearly be present, but one would be hard-pressed to find evidence of Linkage 1 (observed predictor-criterion correlation) for a reason that has nothing to do with the predictor itself, but rather is a problem with the criterion. Similar results would be obtained if a test developer limited sampling of a target construct to certain areas and failed to sample from others. In such a case, the content would be relevant but deficient, thus bringing into question any claim of content-oriented validity evidence for the test.

Determining which content to sample is the critical question when attempting to accumulate content-oriented validity evidence, and it is not always a straightforward matter. Imagine a situation in which an I-O psychologist and a statistician serve on a technical advisory panel for a project that will involve the development of work sample tests of job performance for a variety of jobs. The statistician might argue that the sample of tasks to assess via the work sample measure should be determined by a random draw from the domain of, say, 100 tasks. The I-O psychologist might disagree strongly with this notion, arguing that practical constraints require that we eschew strict random sampling of job tasks in favor of a more targeted identification of the most important and/or most frequently performed and/or most feasibly measured content. This might result in our drawing a sample of only 10% of the tasks performed on the job that represent 90% of the time spent by incumbents on the job. We believe it is reasonable to argue that the latter measure exhibits more useful content-oriented validity evidence than the former might, should we find ourselves on the bad end of a random draw where 8 of the 10 tasks sampled were rarely performed and/or relatively unimportant. The point here is that sampling is still being considered, but the dimension of sampling adequacy has changed from content to some other key dimension (e.g., time spent performing, importance). Good judgment and common sense will need to be applied to determine whether the resulting criterion measure can be said to be a content-valid measure of job performance.

Murphy ( 2009 ) argued that high levels of job relatedness (a strong Linkage 5) are not necessary for validity (which he essentially defined as criterion-related validity—most clearly Linkage 1, although Linkage 3 might also be considered). As a primary piece of evidence, he described a “mailroom mixup” in which two test batteries (one to select entry-level machine operators and one to select data entry clerks) get misassigned and sent to the opposite organizations. Murphy argued that “it is likely that the validity of these test batteries as predictors of performance would not have been markedly different if the right test batteries had been shipped in the first place” (p. 455). This is true, of course, because our predictor measures can assess broad constructs of abilities, skills, and other characteristics (e.g., personality traits, vocational interests) that might well be related to job performance but not necessarily job-specific the way that job-specific knowledge or skills might be. This does not mean that tests constructed to have high overlap with job content will be invalid—simply that this high degree of content overlap is not necessary (that is, Linkages 2 and 3 can still be strong).

Note, however, that for a test to have criterion-related validity but low levels of overlap in terms of job content, there must be overlap of something . The overlap in the instances provided by Murphy ( 2009 ) did not exist at the level of tasks or job knowledge, but it was there nevertheless. The overlap apparent in the Murphy article occurred at the latent trait level. So, the content has moved beyond the more finite realm of a knowledge or task domain to the arguably less tractable but no less important domain of more general cognitive abilities. Similar to the situation in which we argued that it was reasonable to target our sampling of tasks to focus on those that are most frequently performed or most important, one can shift the focus from job-specific content to latent content—that is, a shift from specific behaviors to more general abilities deemed necessary for successful job performance.

A test can adequately sample the specific knowledge required and tasks performed on the job (think of the electrical technicians knowledge test), or that test can adequately sample the specific latent traits thought necessary for successful job performance ( g , conscientiousness, agreeableness, whatever). Tying this back to our Figure 6.1 , the former would provide evidence of Linkage 5 and the latter would provide evidence of Linkage 2. Both can be viewed as having content validity in a broad sense, but the content represents different domains. The former regards specific job tasks and sampling from the criterion construct domain, and the latter regards constructs such as general abilities and basic skills and sampling from the predictor construct domain. Murphy's ( 2009 ) article (drafted to elicit comment) confounds these two domains (which represent different areas of our Figure 6.1 ) to argue that job relatedness is not a good indicator of a test's content validity. This should surprise no one. If the latent traits assessed by the test are required for successful job performance, we should expect such a test to have low job relatedness but high criterion-related validity. This in no way should suggest, however, that adequate sampling of content from the criterion construct domain is unimportant.

The “mailroom mixup” that Murphy offered is stacked in his favor. As Tonowski ( 2009 ) noted, there are likely very severe limits to such examination interchangeability if the tests assessed job-specific knowledge, as one might find with credentialing examinations (see also Putka, McCloy, Ingerick, O'Shea, & Whetzel, 2009 ). Imagine mixing up assessments for veterinary surgeons with examinations for fireplace/hearth installers. One would be thought mad to argue that the predictive validity of performance in the two professions would be little affected by the choice of test to administer to credentialing applicants. On the other hand, a measure of general cognitive ability would likely predict performance in each quite well (range restriction aside) because we believe intelligence is necessary for acquiring job-specific knowledge, whether that knowledge entails horse physiology or proper venting procedures for gas fireplaces.

Content Validity for What?

In a paper relating the importance of criteria to validity results, Weitz ( 1961 ) argued that researchers should consider the criterion as a parameter when one is interested in determining the impact of an independent variable. Although Weitz was speaking from an experimentalist's perspective and never mentioned a validity coefficient, his conclusions ring true for every psychologist conducting validation research. He showed that the outcomes obtained from an experiment investigating the effects of different types of verbal association on the mediation of transfer of learning depended substantially on the criterion chosen—the implication, of course, being that what one concludes about the validity of one or more predictor measures is a function of the criteria used to validate them. Weitz argued that better understanding our criteria allows us to better understand our predictors: “The measure of the dependent variable (the criteria) should give us more insight into the operation of the independent variable if we know the rules governing the operation of criterion measures” (p. 231). For the I-O psychologist, this may be interpreted to mean that knowing what determines variation in our criteria will allow us to better understand what might determine the variance in our predictors. Such understanding, in turn, would greatly aid both our choice of criteria and our choice of the predictors that would best predict those criteria (and thus bolster Linkage 1).

Jenkins ( 1946 ) authored a paper lamenting the availability of suitable criteria for use in validation studies. He titled his paper “Validity for What?” Drawing upon the notion that one must ask about the specific conditions under which validity inferences are made about the use of a particular measure, we advocate applying similar logic to the notion of content validity by asking “content validity for what?” That is, what content are we interested in capturing with our measure and thereby demonstrating its adequacy of sampling? Weitz argued that the choice of criterion variable was critical to the degree to which a test evidences criterion-related validity. Similarly, it is critical to understand the level of content match for us to determine whether a claim for content-oriented validity evidence can be made. There seems to be no reason to limit ourselves to highly detailed job-specific tasks and knowledge when evaluating measures for content validity. KSAO matches have also proven sufficient for reasonable levels of criterion-related validity, and thus the “content” would shift from tasks and knowledge (Linkage 5) to constructs underlying successful job performance (Linkage 2). This is important from the “how well we measure what we meant to measure” notion of validity. Given that, we almost certainly have a strong basis for making the correct inferences from the test (but of course we can still drop the ball and use the test improperly—thinking it is measuring something other than what it is, using it to obtain scores on the incorrect population, and so on).

When originally conceived, content validity seemed to be paired primarily with relatively well-defined knowledge or task domains, particularly those encountered in educational measurement settings. The application of content validity notions to constructs seemed less defensible, despite Guion's ( 1977 ) argument to the contrary. Today, we acknowledge that content relevance of a test may well extend to the notion of developing an adequate operational definition of a latent trait (construct). Thus, it seems reasonable to consider content validity from two levels of content specificity: (1) job-specific tasks, knowledge, and skills that would seem to generalize poorly to many (but not all) other jobs (e.g., knowledge of how to bleed brake lines or of interstate commerce laws; skill at troubleshooting a heat pump or motion sensor), and (2) constructs hypothesized to underlie successful job performance (e.g., general cognitive ability, conscientiousness, extroversion, dynamic strength) that might well generalize to many jobs. With the former notion, we are close to the original conceptualization of content validity and the evidentiary basis of Linkage 5—that is, the degree to which a test can be deemed to be valid for use in selection based on an adequate sampling of the behavioral domain reflected in criterion performance. With the latter notion, we reduce the magnification of our view, looking more generally at the broad construct representation of our test and determining its validity based on its assessment of key performance-related constructs and move more toward the evidentiary basis of Linkage 2 (and its concomitant requirement to demonstrate Linkage 3). We believe it is reasonable to consider that we appeal to content validity in those instances, too, but content of the underlying KSAOs, not in terms of the specific job content (necessarily). Strong job content overlap virtually guarantees reasonable predictive validity, assuming strong content sampling—indeed it is direct evidence of Linkage 5. Lack of job relatedness can also lead to strong criterion-related validity, assuming we have either (1) good predictor construct sampling (evidence of Linkage 2) and evidence that such predictor constructs relate to criterion constructs (evidence of Linkage 3), or (2) evidence of relations between our predictor and observed criterion (evidence of Linkage 1) and evidence that our observed criterion reflects the criterion construct domain of interest (evidence of Linkage 4).

The goal of this conceptualization is to overcome many of the seeming contradictory statements made in the literature about the validity of assessments that match job content to a great extent. Developing a test based on job content matching, as Murphy ( 2009 ) acknowledged, does seem to provide the test developer with a high probability of ensuring that the resulting test will be valid. As discussed, however, content matching alone is not sufficient to guarantee validity. Yet, content validity remains a useful concept if considered more broadly to include construct similarity rather than mere content similarity. As before, this notion of content validity retains the critical component of adequate domain sampling for the test in question, whether that sample is drawn from a domain of job-specific content or construct-related content. The notion of content validity as a necessary step in the creation of an adequate and defensible operational definition, and thereby an important part of construct validation, remains.

Another goal of adopting this view of content validity is to allow its continued use by organizations that find it infeasible to collect criterion-related validity evidence for the tests upon which they rely. Murphy ( 2009 ) lamented that there was a dearth of evidence in the arguments provided to his article that tests justified for use via a content validity argument demonstrated criterion-related validity. However, it seems clear that it is precisely in such situations that organizations depend so heavily on content validity. Security concerns or other logistical obstacles render collection of criterion-related validity data all but impossible. In these instances, arguing for the validity of a predictor on the basis of sound test development processes and hypothesized relations between the content assessed and the content to be predicted seems reasonable. Of course criterion-related validity evidence would be welcomed, but even there we hear the warnings of Schmidt, Hunter, and colleagues about the vagaries of local validity studies. Surely we have come to the point at which we can, in many instances, turn to our understanding of test development and the literature on relations between abilities and job performance to determine whether a particular test shows promise as a predictor in various circumstances.

Using Construct-Oriented Validity Evidence to Support the Predictive Inference

The focus in this section is on approaches that pair evidence of relations between predictor and criterion construct domains (Linkage 3) with evidence that the predictor measure reflects the predictor construct domain. (Linkage 2). The key linkage between predictor and criterion construct domains is generally viewed as difficult to establish, and the use of such a “construct validity” strategy is not widely used in operational testing settings. As other commentators have noted (Landon & Arvey, 2007 ), one major contributor to this is the influence of the 1978 Uniform Guidelines on Employee Selection Procedures , which compartmentalizes validation strategies into the traditional “trinitarian” approach (criterion-related, content, and construct) and specifies the evidence needed to support predictor use under each strategy. The label “construct validity” is applied to jointly establishing Linkages 2 and 3. The Guidelines note that there is little established literature on establishing these linkages, and that doing so is “an extensive and arduous effort involving a series of research studies, which include criterion-related validity studies” (Section 14D). This in effect took construct validity off the table in terms of providing local evidence of validity: If criterion-related validity evidence alone was an acceptable strategy for establishing the job relatedness of a predictor measure, why would anyone with the resources to conduct a criterion-related validity study ever undertake the more arduous burden of a construct validity strategy? In the absence of local evidence, the Guidelines also stated that a claim of construct validity based on evidence from another setting was permissible only when the evidence included a criterion-related study that met standards for transporting validity evidence (e.g., close similarity between the original setting and the local setting).

Our sense is that the Guidelines have had a chilling effect on the applied use of construct validity strategies. Efforts aimed at linking predictor and criterion construct domains have been undertaken more in the context of research aimed at a better understanding of predictor construct domains, criterion construct domains, and their relation than as applied validation efforts. We will briefly review three approaches to linking predictor and criterion construct domains, which we label “local empirical,” “generalized empirical,” and “rational/judgmental.”

Local Empirical Approaches to Linking Construct Domains

A major trend in data analytic strategies in recent years is the move from a focus on relations between observed variables to a focus on relations between latent variables. The prototypic example is the emergence of structural equations modeling (SEM) as a data analytic technique. SEM applications commonly involve two features: the theoretical specification and empirical testing of a measurement model and the specification and testing of a structural model. A measurement model specifies the relation between observed variables and latent constructs, whereas a structural model specifies the relation between latent constructs. For example, Vance, Coovert, MacCallum, and Hedge ( 1989 ) studied the prediction of performance of Air Force jet engine mechanics. They theorized that three predictor constructs (experience, capability, and support from supervisor) predicted the criterion domain of task performance. They first developed and tested measurement models for the predictor construct domain. For example, for the predictor construct of “experience” they obtained multiple measures, including months on the job and the number of times various tasks had been performed. Confirmatory factor analysis supported the loading of these observed experience indicators on a latent experience factor. On the criterion domain side, observed measures of task performance included self, supervisor, and peer ratings. Again, confirmatory factor analysis supported the loading of these on a task performance factor (in addition to also loading on rater source-specific factors) (i.e., Linkage 4). Once these measurement models were in place, Vance et al. could test a structural model linking the predictor latent constructs and the criterion latent constructs, and they demonstrated linkages between the experience and capability predictor domains and the task performance domain.

A variant on this approach is illustrated in Arvey, Landon, Nutting, and Maxwell's ( 1992 ) study of the physical ability domain in the context of police officer selection. Rather than separate measurement models for the predictor and criterion construct domains, followed by a structural model linking the two domains, they posited a measurement model in which predictor and criterion measures load on the same latent construct. They used job analytic data to posit that strength and endurance were two major constructs of importance for police officer performance. They selected a battery of eight physical tests (e.g., dummy drag, obstacle course, one-mile run), and collected supervisor ratings of on-the-job performance of eight physical activities. They then used SEM to show that one subset of the predictor tests and performance ratings loaded on a latent “strength” construct, whereas another subset loaded on a latent “endurance” construct. Thus, the demonstration that predictor tests and criterion ratings load on common latent constructs links the predictor and criterion domains.

These two examples involve elegant and insightful ways of arraying empirical data that could also be analyzed in a more traditional criterion-related validity framework. The observed predictor measures could have been correlated with the observed criterion measures, with job analytic data used to support the relevance of the criterion measures to the criterion construct domain. The construct approach gives a richer meaning to the pattern of findings than would be obtained with the simple correlation of predictor and criterion measures, and offers more useful insights to the field as a whole. That said, it is a local value judgment as to whether to focus on the observed variable or construct levels in conducting and reporting the analyses.

Generalized Empirical Approaches to Linking Construct Domains

A second approach to establishing Linkage 3 focuses on empirical evidence outside the local setting. We posit that the vast majority of meta-analyses in the selection domain focus on relations between predictor and criterion constructs (Linkage 3), rather than on predictor and criterion variables (Linkage 1). In fact, thinking about meta-analysis in light of Figure 6.1 helps make clear that there are four settings in which meta-analysis can be applied in the study of predictor-criterion relations:

(1) specific predictor measure – specific criterion measure . Whereas this would illustrate Linkage 1, we find that this approach is, in fact, quite rare. One example from an educational selection setting is the work of Sackett, Kuncel, Arneson, Cooper, and Waters ( 2009 ) in which meta-analyses of SAT-Freshman GPA correlations from 41 colleges were reported. Here the same predictor and criterion measures are used in all 41 studies, directly testing Linkage 1. Conceptually, this approach might be followed by a test publisher who uses a common criterion measure in multiple studies across various organizations.

( 2) specific predictor measure – criterion construct domain . Examples of this would be Hogan and Holland's ( 2003 ) meta-analysis using specific Hogan Personality Inventory scales as predictors of criteria categorized by subject matter experts as reflecting the two broad constructs of “getting ahead” and “getting along,” or Berry, Sackett, and Tobares's ( 2010 ) meta-analysis of relations between the Conditional Reasoning Test of Aggression and a variety of measures classified as reflecting a construct of counterproductivity. Note that such a meta-analysis can be seen as directly linking the operational predictor to the criterion construct domain.

( 3) predictor construct – specific criterion . An example of this would be Berry, Ones, and Sackett's ( 2007 ) meta-analysis of correlates of counterproductive work behavior. One analysis included in that article involves the relation between measures of the construct of organizational citizenship behavior and one specific measure of counterproductive work behavior [the Bennett and Robinson ( 2000 ) scale].

(4) predictor construct – criterion construct . This is what is examined in the vast majority of meta-analyses [e.g., studies of the relations between measures categorized as reflecting a predictor construct (general cognitive ability, conscientiousness) and measures categorized as reflecting a criterion construct (e.g., overall job performance, task performance, counterproductive work behavior)].

Although category “ 4” provides evidence for Linkage 3, evidence is also needed for Linkage 2 (the relation between the predictor measure and the predictor construct domain). Some domains are not defined with sufficient specificity to ensure that attaching a given label to a predictor permits drawing on the meta-analytic results in support of the predictor. For example, Sackett and Wanek ( 1996 ) note that Ones, Viswesvaran, and Schmidt ( 1993 ) reported a mean correlation of 0.45 (corrected for error of measurement) between the overt integrity tests examined in their meta-analysis, whereas Ones ( 1993 ) reported a mean correlation of 0.85 between the three overt tests contributing most heavily to the meta-analysis. This suggests that some tests with the integrity test label are not strongly related to others using that label, thus raising concerns about the generalization of meta-analytic findings across tests.

In sum, we view meta-analysis as currently the most prominently used nonlocal empirical approach to linking predictor and criterion construct domains. We refer the reader to this chapter's earlier discussion of issues that need to be taken into account in linking meta-analytic findings to one's local setting.

Rational/Logical Approaches to Linking Construct Domains

There are many settings in which it is necessary to develop and implement selection procedures but one is (1) not able to rely on local empirical evidence due to small samples; (2) not able to rely directly on meta-analysis, as existing meta-analyses do not reflect the predictor domain of interest or the criterion domain of interest, or include jobs and settings judged as similar to those of interest; (3) not able to transport validity evidence due to the unavailability of a strong validity study in a highly similar setting; and (4) interested in considering a broad range of predictor constructs, including those that do not lend themselves to the job-sampling logic that lies at the heart of content-oriented validation strategies. As a vivid concrete example, consider selecting astronauts for a future mission to Mars, with a focus on identifying personality attributes that will contribute to mission success.

A rational/logical approach collects a variety of types of information, which are integrated to make a case for the linkage between predictor and criterion construct domains. Job analysis is central, as identifying job tasks and attributes viewed by subject matter experts as influencing the performance of job tasks provides a foundation for establishing this linkage. Collecting such information in a standardized way and carefully documenting the bases for inferences drawn from the data contribute to making a persuasive argument about predictor and criterion construct domains. In the astronaut example, useful informants are likely to include astronauts with long-duration mission experience. Although their experience is with flights of markedly shorter duration than a Mars mission, they are better positioned than most others to have insights into coping effectively with long-duration isolation.

Another source of information is evidence from conceptually related settings. For example, both job analytic-related and criterion-related validity information might be obtained from other settings involving prolonged isolation, such as polar research stations in which crews “winter over.” Although one can certainly make the case for important differences between settings (e.g., smaller crews for space flight than for polar stations), such that one might stop short of arguing for the direct transportability of validity evidence, one can posit that these settings come closer to the setting of interest than do any others, such that findings from those settings merit careful consideration.

Another source of information would be broader findings from the literature, such as evidence on individual-difference correlates of performance in group or team settings. Meta-analytic evidence of consistency of personality-performance relations across settings, or of features of group settings that moderate these relations, would be informative.

The challenge for the selection system developer is in triangulating and integrating information from these various sources. Consistent patterns of evidence, such as commonalities of findings from job analytic efforts and empirical validation efforts in analog settings, would contribute to making a case for the linkage between domains.

In short, the rational/logical approach requires inferential leaps larger than are required in settings in which extensive local and directly generalizable validity evidence is available. The perspective taken is one of reaching the best conclusions one can about predictor-criterion construct domain linkages given the constraints and limited information.

Variants, Nuances, and Moving Forward

In the sections below, we describe several examples of the nuances of implementing concepts discussed earlier, as well as emerging ideas for enhancing traditional ways of thinking about validity. The purpose of the section is to emphasize that validation remains a developing area, that it is far more than a routine technology to be applied by following a cookbook, and that it is an area calling for careful thought and ingenuity for effective practice.

Predictor-Focused versus Criterion-Focused Validation

Within the empirical predictor-criterion linkage tradition, there are variants that are not commonly laid out clearly in textbook treatments. We will outline two broad strategies. There are many variants of each, but we will present prototypes of each. The first we label criterion-focused validation. In this approach the criterion is central, and the goal is to predict performance on this specific criterion. In this context one is essentially indifferent to the predictor. The rationale for predictor use is that it predicts the criterion of interest. In criterion-focused validation, the prototypical approach is to assemble a trial battery and collect predictor and criterion data. Weights for various predictors are often empirically determined (e.g., regression weights), though theory may drive the weights in settings in which there are insufficient data to produce stable weights. The concept of the “search for alternatives with equal validity and less adverse impact” would include looking at other predictor constructs, not just other ways of measuring the same construct. Cut scores are criterion-based: A cut is linked to specified levels of criterion performance.

The second strategy we label as predictor-focused validation. Here the predictor is of central interest: The goal is to justify the use of a given predictor. Job analysis often (but not always) plays a central role in predictor selection and weighting. Job analysis, or theory, or prior research (e.g., meta-analysis) leads to predictor choice in the case of selecting existing predictors. Job analysis also plays a pivotal role in settings in which predictors are explicitly custom-made for the setting in question. The selection of a criterion is done with the goal of “verifying” the validity of a theoretically chosen predictor. For example, the criterion would constitute job behaviors conceptually expected to be predicted by the predictor of interest. If there are multiple predictors of interest, different criteria may be selected for each (e.g., an interpersonally oriented criterion for personality measures, a criterion focused on the cognitive aspects of a job for cognitive ability measures). Cut scores may be based on levels of the predictor identified as needed for the job in question. The “search for alternatives” is narrow and focuses on other ways of measuring the predictor construct of interest.

We believe there is general preference for the criterion-focused approach, as there is general admonition to start by identifying the criterion of interest. But the predictor-focused approach is, we believe, relatively common, and at times reasonable. Consider the following scenarios. The first involves a setting in which a content-oriented approach guides development and implementation of a selection system and criterion-related validation becomes feasible only after enough time passes for a sufficient number of individuals to have been selected. The predictor is in operational use, and thus the focus of the validation effort is on evaluating the appropriateness of a currently operational predictor. The second scenario involves a setting in which an operational selection system exists that has been evaluated quite thoroughly via various validation studies. As a result of monitoring the literature, organizational decision makers become interested in a new predictor, and wonder whether it might be usefully added to the current selection system. A validity study is undertaken to examine the relation between this new predictor and various work behaviors of interest. The third scenario involves efforts by a commercial test publisher to document the relation between a predictor they wish to market and various criteria in different settings.

Whether an effort involves a predictor-focused validation or a criterion-focused validation may affect a number of choices made in conducting validation research. For example, in criterion-focused validation, all predictors under consideration are evaluated in terms of their contribution to predicting a common focal criterion (or set of criteria). In contrast, in predictor-focused validation, different criteria may be used for various predictors. Rather than, say, using an overall performance measure as the criterion against which all predictors are evaluated, a firefighter selection system might develop domain-specific criterion measures with cognitive, physical, and interpersonal components, with predictors in each domain then chosen and evaluated in terms of their correlation with the criteria to which they are conceptually matched. This use of domain-specific criterion makes the issue of predictor weighting a judgmental one (e.g., based on job analysis) rather than an empirical one (e.g., based on regression weights). It also precludes a single overall validity coefficient for the selection system; however, we see no reason why a selection system's criterion-related validity must be summarized by a single number, rather than in terms of each predictor domain's contribution to the criterion domain to which it is relevant.

In sum, in a variety of settings, the validation effort may reasonably start with a focus on a predictor of interest. This does not mean that the choice of criteria is not central to the effort. The clear specification of the criterion construct domain of interest and the development of sound measures within that domain are essential to any validation effort.

Validity Streams versus Validation Studies

Previous sections of this chapter, as well as past treatments of methods for establishing validity evidence for selection assessments (e.g., Binning & Barrett, 1989 ; Guion, 1998 ; Ployhart, Schneider, Schmitt, 2006 ; Schmitt & Chan, 1998; SIOP, 2003), generally view validity evidence in static terms. That is, the focus is on establishing evidence of validity, documenting it, and using that documentation to defend the operational use of a selection measure or system. The current state of technology, both in terms of applicant tracking and testing systems, as well as enterprise-wide human resource information systems (HRIS), permits the use of a new model for establishing validity evidence and framing it as a potentially malleable characteristic that offers a metric for helping refine predictor content over time. Such a model has yet to be widely discussed in the scientific literature, in part because it is so drastically different from how science has traditionally advanced (i.e., via single sample studies or meta-analytic cumulations of them coming out of academe). Nevertheless, the model we describe below has much appeal and is within the grasp of organizations with the infrastructure and the I-O expertise to support it. 4

With the advent of automated applicant tracking and testing systems, coupled with enterprise-wide HRIS, it is feasible for a large organization to set up a system in which applicants’ assessment scores on experimental and/or operational predictor content are recorded to a database and, after fixed periods of time (e.g., quarterly, yearly after entry), data on the behavior of hired applicants (e.g., turnover, absenteeism, disciplinary actions, promotion rates, awards, job performance) are captured and written to the same database. In a large organization, the stream of applicants for high-density occupations is often fairly continuous, and so, too, is the stream of new hires. In such an environment, applicant assessment and criterion data would continuously be entering the database described above. Thus, an organization with such a database in place could run periodic queries to examine criterion-related validity evidence for predictor content (e.g., scale- and item-level) for predicting a broad swath of potential criteria of interest. Under such a system, the organization would have a continuous “stream” of validation data on which to capitalize. As such, the model described above has been referred to as a streaming validation model (Handler, 2004 ). Such a stream of data could be used in several ways to benefit both the hiring organization and the science of personnel selection. We review several potential benefits below that are beyond the reach of any single validation study or meta-analytic cumulation of single studies.

First, a streaming validation system would provide the organization with a mechanism for monitoring and tweaking the composition of assessment content over time based on item-level statistics that are updated on a continuous basis (e.g., item-level criterion-related validities, item-total correlations; see also DeMars, 2004 ; Donoghue & Isham, 1998 ). 5 Though it may be tempting to dismiss such an approach as an empirical “fishing expedition,” tracking and differentiating between content that functions effectively and ineffectively (and capitalizing on such information) would be based on large quantities of item-level data that could be revisited for stability continuously. In other words, the streaming model could provide a replicable empirical basis for inductive theory building regarding the functioning of content types that is not readily achievable through our current methods of scientific research (Locke, 2007 ).

Second, such a system could also be used as a mechanism for understanding changes in job requirements over time and indicating when job analysis data might need to be updated (Bobko, Roth, & Buster, 2005 ). Several previous researchers have noted that “the job” as an entity is becoming less stable (e.g., Howard, 1995 ; Illgen & Pulakos, 1999 ). Unfortunately, as Bobko et al. ( 2005 ) pointed out, there is little guidance with regard to how often organizations should check the accuracy of the job analysis data on which their selection systems are based. Under the streaming validation system, changes in criterion-related validity of content related to a specific competency could signal the need to revisit the competencies assessed in the selection process. For example, if the criterion-related validity for content associated with some competencies tended to increase over time (i.e., calendar time, not incumbents’ tenure), whereas the criterion-related validity for content associated with other competencies tended to decrease over time, it might suggest that the importance of these competencies to successful job performance had shifted since the original job analysis.

Third, a streaming validation system could provide a powerful mechanism for addressing one of the most vexing issues facing personnel selection research and practice today—the impact of response distortion on the functioning of operational noncognitive selection measures (e.g., Griffith & Peterson, 2006 ; Smith & Robie, 2004 ). For example, say that if in addition to having performance data entered into the system after new hires had been on the job for a fixed amount of time (e.g., 1 year), the system also required employees to complete the noncognitive measures they completed at entry. This would give the organization a steady stream of within-subjects data at the item level, which could be invaluable for refining the content of the noncognitive measures with the intent of making them more robust to the effects of response distortion. Over time, via content analysis methods, such a system would enable the organization to discover the types of item characteristics associated with noncognitive content that remain robust in operational settings (i.e., maintain their validity and other psychometric qualities) and noncognitive content that does not (e.g., White, Young, Hunter, & Rumsey, 2008 ). Such information could be used not only to refine existing content, but also to guide the organization's development of more robust content in the future. As alluded to above, we do not see this simply as a massive exercise in dustbowl empiricism, but rather a new replicable, empirical basis for inductive theory building regarding the functioning of noncognitive content that has only recently become possible given advances in human resources-related technologies.

Streaming validation data could have profound implications for how we think about evaluating and refining the psychometric quality of selection assessments. Indeed, under such a system, criterion-related validity evidence for an assessment would be in flux, could be tracked over time, and acted on in a way that was not possible before the advent of applicant testing and tracking systems and related HRIS technology. At another level, it suggests a fundamental shift in the way we think about the purpose of criterion-related validation efforts. Under a streaming validation model, the focus is not only on whether the predictive inference is based on a given snapshot of data, but also leveraging the stream of validation data to indicate what steps could be taken to (1) ensure those inferences remain valid, and (2) improve the validity of those inferences. In short, the streaming validation model gives organizations a way to continually track the effectiveness of their assessments and establish modifications for improving them. Given the benefits of such a model, as well the continual refinement and dissemination of the technology that makes such a model possible, we envision streaming validation systems becoming more common in the future, particularly in large organizations with the technological infrastructure and in-house (or contractor) I-O expertise to lay out and sell a vision for implementing them to executive leadership.

Need for a Validation Paradigm When Turnover Is a Selection Criterion

The validation concepts and strategies discussed in earlier sections of this chapter, as well as previous discussion of the development and validation of selection measures (e.g., Binning & Barrett, 1989 ; Guion, 1998 ; Schmitt & Chan, 1998), are largely built around the criterion domain being defined in terms of job performance. However, turnover is a critical criterion for many organizations (Griffith & Hom, 2001 ). Even if the selection system is carefully designed to increase the likelihood of hiring top performers, the utility realized by the organization from the selection system will be diminished if successful hires leave the organization prematurely (e.g., Green & Mavor, 1994 ). If one decides not to limit the criterion domain to job performance, but to also emphasize turnover, the validation paradigm discussed in this chapter might look quite different. To illustrate, consider Figure 6.1 and the following questions: How would Figure 6.1 look if turnover, rather that job performance, defined the criterion construct domain of interest? How would the strategies used to establish evidence of linkages within the revised model differ?

First and foremost, with turnover as an observed criterion, the parts of Figure 6.1 labeled observed criterion measure and criterion construct domain would be indistinguishable, because the domain in this case is reflected in the observed behavior of turnover. Thus, the upper and lower right quadrant of Figure 6.1 would become one.

Second, job analysis plays a critical role in justifying and establishing predictive hypotheses when job performance defines the criterion domain and, as such, plays a key role in validation strategies discussed in this chapter. Job analysis focuses on identifying critical job tasks and the KSAOs required for effectively performing them, and is a cornerstone of the dominant personnel selection development and validation paradigm (e.g., Guion, 1998 ; Ployhart, Schneider, & Schmitt, 2006 ; Schmitt & Chan, 1998). However, based on the existing turnover and person-environment fit literatures (e.g., Griffith, Hom, & Gaertner, 2001 ; Kristof-Brown, Zimmerman, & Johnson, 2005 ), KSAOs identified through job analysis are unlikely to reflect individual-difference constructs that would be most predictive of individuals’ decisions to leave their job. Therefore, if the traditional job analysis method is not sufficient, how are predictors of turnover that might be used in a selection context best identified, and what implications does this have for content-oriented validation strategies discussed earlier?

With regard to the first question, simply performing an analysis of the work context to identify what fit-related individual difference constructs (such as the interests or work values) it supports for job incumbents might be insufficient. Indeed, identifying interests and values that are not supported by a local work context (but that an applicant desires) could be just as critical to subsequent measure development from a prediction standpoint (e.g., Van Iddekinge, Putka, & Campbell, 2011 ). On a related note, given that precursors of turnover have been tied not only to features of jobs, but also to features of the broader organizational context in which individuals work, any turnover-centered job analysis method should focus on the broader organizational context in which the work is performed, as well as on the job itself.

The scientific literature on personnel selection is scant with regard to the questions raised above. In general, the notion of developing and establishing validity evidence for predictors of employee turnover that would be appropriate to use in a selection context is an area in need of far greater attention. Such research would be of great help to practitioners tasked with establishing validity evidence when turnover is a criterion of concern. Thus, we call for future researchers to develop a more explicit framework for establishing validity evidence when turnover factors into the definition of the criterion construct domain, and integrating that framework into the existing one commonly discussed when job performance is the criterion.

Conclusions

Validity is a concept with a long and evolving history. Similarly, validation processes continue to evolve based on new methodological and technological developments (e.g., validity generalization, structural equations modeling). We have attempted to outline the central issues in applying the concept of validity to the domain of personnel selection, and to provide an overview of the approaches available for addressing the primary validity issue in personnel selection: establishing the predictive inference of a relation between observed predictor scores and the criterion construct domain of interest.

Although typically viewed as a factor that limits the feasibility of local criterion-related validity studies, all criterion-related strategies we review are based, at some point, on observed criteria (e.g., primary studies in meta-analytic VG studies, original studies that provide the basis of transportability evidence). As such, the issues we discuss here have implications not only for the veracity of Linkage 4 for local studies, but also other criterion-related strategies as well (e.g., Oswald & McCloy, 2003 ).

Note, sometimes this point is referred to as the 90% credibility value .

Guion's original article discussed not a sampling of items/content as much as a sampling of behaviors. Thus, the stimuli were not as front-and-center as the responses to the stimuli. Today's discussions seem to have melded the two into one, with much more focus now on the sampling domain of the test content (items, exercises, whatever).

Portions of this section were drawn from a 2006 technical report produced by the Human Resources Research Organization (HumRRO) and authored by Dan J. Putka, this chapter's second author.

Note, the references cited here address literature that has examined drift in item parameters from an item response theory (IRT) perspective. However, under a streaming validation model, “item drift” could also be tracked for other parameters of interest in personnel selection (e.g., item-level criterion-related validities, item-level mean subgroup differences).

Aguinis, H., & Whitehead, R. ( 1997 ). Sampling variance in the correlation coefficient under indirect range restriction: Implications for validity generalization.   Journal of Applied Psychology , 82 , 528–538.

American Educational Research Association, American Psychological Association, and National Council on Measurement in Education ( 1999 ). Standards for educational and psychological testing . Washington, DC: American Educational Research Association.

Google Scholar

Google Preview

American Psychological Association. ( 1954 ). Technical recommendations for psychological tests and diagnostic techniques.   Psychological Bulletin Supplement , 51 (2, Part 2), 1–38.

Arvey, R. D., Landon, T. E., Nutting, S. M., & Maxwell, S. E. ( 1992 ). Development of physical ability tests for police officers: A construct validation approach.   Journal of Applied Psychology, 77, 996–1009.

Austin, J. T., & Villanova, P. ( 1992 ). The criterion problem: 1917–1992.   Journal of Applied Psychology , 77 , 836–874.

Barrett, G. V. ( 2008 ). Practitioner's view of personality testing and industrial–organizational psychology: Practical and legal issues.   Industrial and Organizational Psychology: Perspectives on Science and Practice , 1 , 299–302.

Barrett, G. V., Miguel, R. F., Hurd, J. M., Lueke, S. B., & Tan, J. A. ( 2003 ). Practical issues in the use of personality tests in police selection.   Public Personnel Management , 32 , 497–517.

Bennett, R. J., & Robinson, S. L. ( 2000 ). Development of a measure of workplace deviance.   Journal of Applied Psychology, 85, 349–360.

Bennett, W., Lance, C. E., & Woehr, D. J. ( 2006 ). Performance measurement: Current perspectives and future challenges . Mahwah, NJ: Lawrence Erlbaum Associates.

Berry, C. M., Ones, D. S., & Sackett P. R. ( 2007 ). Interpersonal deviance, organizational deviance, and their common correlates: A review and meta-analysis.   Journal of Applied Psychology , 92 , 410–424.

Berry, C. M., Sackett, P. R., & Tobares, V. ( 2010 ). A meta-analysis of conditional reasoning tests of aggression.   Personnel Psychology, 63, 361–384.

Binning, J. F., & Barrett, G. V. ( 1989 ). Validity of personnel decisions: A conceptual analysis of the inferential and evidential bases.   Journal of Applied Psychology , 74 , 478–494.

Bobko, P. ( 1983 ). An analysis of correlations corrected for attenuation and range restriction.   Journal of Applied Psychology, 68, 584–589.

Bobko, P., Roth, P. L., & Bobko, C. ( 2001 ). Correcting the effect size of d for range restriction and unreliability.   Organizational Research Methods , 4 , 46–61.

Bobko, P., Roth, P. L., & Buster, M. A. (2005, June). A systematic approach for assessing the recency of job-analytic information . Presentation at the 29th Annual International Public Management Association Assessment Council Conference, Orlando, FL.

Brannick, M. T. ( 2001 ). Implications of empirical Bayes meta-analysis for test validation.   Journal of Applied Psychology, 86, 468–480.

Campbell, J. P., & Knapp, D. J. (Eds.) ( 2001 ). Exploring the limits in personnel selection and classification . Mahwah, NJ: Lawrence Erlbaum Associates.

Chan, W., & Chan, D. W. ( 2004 ). Bootstrap standard error and confidence intervals for the correlation corrected for range restriction: A simulation study.   Psychological Methods , 9 , 369–385.

Cohen, J. ( 1988 ). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

Crocker, L., & Algina, J. ( 1986 ). Introduction to classical and modern test theory . Orlando, FL: Harcourt Brace Jovanovich, Inc.

Cronbach, L. J. ( 1971 ). Test validation. In R. L. Thorndike (Ed.), Educational measurement (pp. 221–237). Washington, DC: American Council on Education.

Cronbach, L. J., & Meehl, P. E. ( 1955 ). Construct validity in psychological tests.   Psychological Bulletin, 52, 281–300.

Cureton, E. E. ( 1951 ). Validity. In E. F. Lindquist (Ed.), Educational measurement (pp. 621–694). Washington DC: American Council on Education.

De Corte, W., Lievens, F., & Sackett, P. R. ( 2006 ). Predicting adverse impact and mean criterion performance in multistage selection.   Journal of Applied Psychology , 91 , 523–537.

DeMars, C. ( 2004 ). Detection of item parameter drift over multiple test administrations.   Applied Measurement in Education, 17, 265–300.

Donoghue, J. R., & Isham, S. P. ( 1998 ). A comparison of procedures to detect item parameter drift.   Applied Psychological Measurement, 22, 33–51.

Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor, & Department of Justice. ( 1978 ). Uniform guidelines on employee selection procedures.   Federal Register , 43 , 38294–38309.

Finch, D. M., Edwards, B. D., & Wallace, J. C. ( 2009 ). Multistage selection strategies: Simulating the effects on adverse impact and expected performance for various predictor combinations.   Journal of Applied Psychology , 94 , 318–340.

Gibson, W. M., & Caplinger, J. A. ( 2007 ). Transportation of validation results. In S. M. McPhail (Ed.), Alternative validation strategies: Developing new and leveraging existing validity evidence (pp. 29–81). Hoboken, NJ: John Wiley & Sons Inc.

Green, B. F., & Mavor, A. S. (Eds.) ( 1994 ). Modeling cost and performance for military enlistment: Report of a workshop . Washington, DC: National Academy Press.

Griffith, R. W., & Hom, P. W. ( 2001 ). Retaining valued employees . Thousand Oaks, CA: Sage Publications.

Griffith, R. W., Hom, P. W., & Gaertner, S. ( 2001 ). A meta-analysis of antecedents and correlates of employee turnover: Update, moderator tests, and research implications for the next millennium.   Journal of Management, 26, 463–488.

Griffith, R. L., & Peterson, M. ( 2006 ). A closer examination of applicant faking behavior. Greenwich, CT: Information Age Publishing.

Guion, R. M. ( 1965 ). Synthetic validity in a small company: A demonstration.   Personnel Psychology, 18, 49–63.

Guion, R. M. ( 1974 ). Open a new window: Validities and values in psychological measurement.   American Psychologist, 29, 287–296.

Guion, R. M. ( 1977 ). Content validity: The source of my discontent.   Applied Psychological Measurement, 1, 1–10.

Guion, R. M. ( 1998 ). Assessment, measurement, and prediction for personnel decisions . Mahwah, NJ: Lawrence Erlbaum Associates.

Guion, R. M. ( 2009 ). Was this trip necessary?   Industrial and Organizational Psychology, 2, 465–468.

Hakstian, A. R., Schroeder, M. L., & Rogers, W. T. ( 1988 ). Inferential procedures for correlation coefficients corrected for attenuation.   Psychometrika , 53 , 27–43.

Handler, C. (2004, April). Technology's role in the evolution of acceptable test validation strategies . Panel discussion at the 19th Annual Society for Industrial and Organizational Psychology Conference, Chicago, IL.

Hamilton, J. W., & Dickinson, T. L. ( 1987 ). Comparison of several procedures for generating J-coefficients.   Journal of Applied Psychology, 72, 49–54.

Hoffman, C. C., Holden, L. M., & Gale, K. ( 2000 ). So many jobs, so little “N”: Applying expanded validation models to support generalization of cognitive test validity.   Personnel Psychology , 53 , 955–991.

Hoffman, C. C., Rashkovsky, B., & D'Egidio, E. ( 2007 ). Job component validity: Background, current research, and applications. In S. M. McPhail (Ed.), Alternative validation strategies: Developing new and leveraging existing validity evidence (pp. 82–121). Hoboken, NJ: John Wiley & Sons Inc.

Hogan, J., Davies, S. , & Hogan, R. ( 2007 ). Generalizing personality-based validity evidence. In S. M. McPhail (Ed.), Alternative validation strategies: Developing new and leveraging existing validity evidence (pp. 181–229). Hoboken, NJ: John Wiley & Sons Inc.

Hogan, J., & Holland, B. ( 2003 ) Using theory to evaluate personality and job performance relationships.   Journal of Applied Psychology, 88,100–112.

Howard, A. (Ed.). ( 1995 ). The changing nature of work . San Francisco: Jossey-Bass.

Hunter, J. E., & Schmidt. F. L. ( 1990 ). Methods of meta-analysis, correcting error and bias in research findings . Newbury Park, CA: Sage.

Illgen, D. R., & Pulakos, E. D. (Eds.). ( 1999 ). The changing nature of performance: Implications for staffing, motivation, and development . San Francisco: Jossey-Bass.

Jenkins, J. G. ( 1946 ). Validity for what?   Journal of Consulting Psychology , 10 , 93–98.

Johnson, J. W. ( 2007 ). Synthetic validity: A technique of use (finally). In S. M. McPhail (Ed.), Alternative validation strategies: Developing new and leveraging existing validity evidence (pp. 122–158). Hoboken, NJ: John Wiley & Sons Inc.

Johnson, M, & Jolly, J. ( 2000 ). Extending test validation results from one plan location to another: Application of transportability evidence.   The Journal of Behavioral and Applied Management, 1, 127.

Kane, M. T. ( 2006 ). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: American Council on Education and Praeger Publishers.

Kristof-Brown, A. L., Zimmerman, R. D., & Johnson, C. ( 2005 ). Consequences of individuals’ fit at work: A meta-analysis of person-job, person-organization, person-group, and person-supervisor fit.   Personnel Psychology, 58, 281–342.

Landon, T. E., & Arvey, R. D. ( 2007 ). Practical construct validation for personnel selection. In S. M. McPhail (Ed.), Alternative validation strategies: Developing new and leveraging existing validity evidence (pp. 317–345). Hoboken, NJ: John Wiley & Sons Inc.

Landy, F. J. ( 1986 ). Stamp collecting versus science: Validation as hypothesis testing.   American Psychologist , 41 , 1183–1192.

Landy, F. J. ( 2007 ). The validation of personnel decisions in the twenty-first century: Back to the future. In S. M. McPhail (Ed.), Alternative validation strategies: Developing new and leveraging existing validity evidence (pp. 409–426). Hoboken, NJ: John Wiley & Sons Inc.

Locke, E. A. ( 2007 ). The case for inductive theory building.   Journal of Management, 33, 867–890.

Loevinger, J. ( 1957 ). Objective tests as instruments of psychological theory [Monograph No. 9].   Psychological Reports, 3, 635–694.

Mackenzie, S. B., Podsakoff, P. M., & Jarvis, C. B. ( 2005 ). The problem of measurement model misspecification in behavioral and organizational research and some recommended solutions.   Journal of Applied Psychology, 90, 710–730.

McDaniel, M. A. ( 2007 ). Validity generalization as a test validation approach. In S. M. McPhail (Ed.), Alternative validation strategies: Developing new and leveraging existing validity evidence (pp. 159–180). Hoboken, NJ: John Wiley & Sons Inc.

McDaniel, M. A., Rothstein, H. R., & Whetzel, D. L. ( 2006 ). Publication bias: A case study of four test vendors.   Personnel Psychology, 59, 927–953.

McPhail, S. M. (Ed.) ( 2007 ). Alternative validation strategies: Developing new and leveraging existing validity evidence . Hoboken, NJ: John Wiley & Sons Inc.

Messick, S. ( 1981 ). Evidence and ethics in the evaluation of tests.   Educational Researcher , 10 , 9–20.

Messick, S. ( 1989 ). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education and Macmillan.

Murphy, K. R. ( 2003 ). The logic of validity generalization. In K. R. Murphy (Ed.), Validity generalization: A critical review (pp. 1–29). Mahwah, NJ: Lawrence Erlbaum Associates.

Murphy, K. R. ( 2008 ). Explaining the weak relationship between job performance and ratings of job performance.   Industrial and Organizational Psychology: Perspectives on Science and Practice, 1, 148–160.

Murphy, K. R. ( 2009 ). Content validation is useful for many things, but validity isn't one of them.   Industrial and Organizational Psychology: Perspectives on Science and Practice , 2 , 453–464.

Murphy, K. R., & DeShon, R. ( 2000 ). Interrater correlations do not estimate the reliability of job performance ratings.   Personnel Psychology , 53 , 873–900.

Murphy, K. R., Myors, B., & Wolach, A. ( 2009 ). Statistical power analysis: A simple and general model for traditional and modern hypothesis tests (3rd ed.). New York: Routledge/Taylor & Francis Group.

Murphy, K. R., & Shiarella, A. H. ( 1997 ). Implications of the multidimensional nature of job performance for the validity of selection tests: Multivariate frameworks for studying test validity.   Personnel Psychology , 50 , 823–854.

Newman, D. A., Jacobs, R. R., & Bartram, D. ( 2007 ). Choosing the best method for local validity estimation: Relative accuracy of meta-analysis versus a local study versus Bayes-analysis.   Journal of Applied Psychology , 92 , 1394–1413.

Ones, D. S. (1993). The construct validity of integrity tests . Unpublished Ph.D. dissertation, August, 1993, University of Iowa.

Ones, D. S., Viswesvaran, C. & Schmidt, F. ( 1993 ). Comprehensive meta-analysis of integrity test validities: Findings and implications for personnel selection and theories of job performance.   Journal of Applied Psychology Monograph, 78, 679–703.

Oswald, F. L., & McCloy, R. A. ( 2003 ). Meta-analysis and the art of the average. In Murphy, K. R. (Ed.), Validity generalization: A critical review (pp. 311–338). Mahwah, NJ: Lawrence Erlbaum Associates.

Pearlman, K., Schmidt, F. L., & Hunter, J. E. ( 1980 ). Validity generalization results for tests used to predict job proficiency and training success in clerical occupations.   Journal of Applied Psychology , 65 , 373–406.

Peterson, N.G., Wise, L.L., Arabian, J. , & Hoffman, R.G. ( 2001 ). Synthetic validation and validity generalization: When empirical validation is not possible. In J. P. Campbell & D. J. Knapp (Eds.), Exploring the limits of personnel selection and classification (pp. 411–451). Mahwah, NJ: Lawrence Erlbaum Associates.

Ployhart, R. E., Schneider, B., & Schmitt, N. ( 2006 ). Staffing organizations: Contemporary practice and theory (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.

Putka, D. J., McCloy, R. A., Ingerick, M., O'Shea, P. G., & Whetzel, D. L. ( 2009 ). Links among bases of validation evidence: Absence of empirical evidence is not evidence of absence.   Industrial and Organizational Psychology: Perspectives on Science and Practice , 2 (4), 475–480.

Putka, D., & Sackett, P. R. ( 2010 ). Reliability and validity. In J. Farr & N. Tippins (Eds.), Handbook of personnel selection (pp. 9–49). Mahwah, NJ: Lawrence Erlbaum Associates.

Raju, N. S., & Brand, P. A. ( 2003 ). Determining the significance of correlations corrected for unreliability and range restriction.   Applied Psychological Measurement , 27 (1), 52–71.

Roberts, B. W., Chernyshenko, O. S., Stark, S., & Goldberg, L. R. ( 2005 ). The structure of conscientiousness: An empirical investigation based on seven major personality questionnaires.   Personnel Psychology , 58 , 103–139.

Sackett, P. R. ( 2003 ). The status of validity generalization research: Key issues in drawing inferences from cumulative research findings. In K. R. Murphy (Ed.), Validity generalization: A critical review (pp. 91–114). Mahwah, NJ: Lawrence Erlbaum Associates.

Sackett, P. R., Kuncel, N. R., Arneson, J. J., Cooper, S. R., & Waters, S. D. ( 2009 ). Does socio-economic status explain the relationship between admissions tests and post-secondary academic performance?   Psychological Bulletin, 135, 1–22.

Sackett, P. R., & Roth, L. ( 1996 ). Multi-stage selection strategies: A Monte Carlo investigation of effects on performance and minority hiring.   Personnel Psychology , 49 , 549–572.

Sackett, P. R., & Wanek, J. E. ( 1996 ). New developments in the use of measures of honesty, integrity, conscientiousness, dependability, trustworthiness, and reliability for personnel selection.   Personnel Psychology 47, 787–829.

Sackett, P. R., & Yang, H. ( 2000 ). Correction for range restriction: An expanded typology.   Journal of Applied Psychology , 85 , 112–118.

Scherbaum, C. A. ( 2005 ). Synthetic validity: Past, present, and future.   Personnel Psychology , 58 , 481–515.

Schmidt, F. L., & Hunter, J. E. ( 1977 ). Development of a general solution to the problem of validity generalization.   Journal of Applied Psychology , 62 , 529–540.

Schmidt, F. L., & Hunter, J. E. ( 1996 ). Measurement error in psychological research: Lessons from 26 research scenarios.   Psychological Methods , 1 , 199–223.

Schmidt, F. L., & Raju, N. S. ( 2007 ). Updating meta-analytic research findings: Bayesian approaches versus the medical model.   Journal of Applied Psychology , 92 , 297–308.

Schmidt, F. L., Viswesvaran, C., & Ones, D. S. ( 2000 ). Reliability is not validity and validity is not reliability.   Personnel Psychology, 53, 901–912.

Schmitt, N., & Fandre, J. ( 2008 ). The validity of current selection methods. In S. Cartwright & C. L. Cooper (Eds.), Oxford handbook of personnel psychology (pp. 163–193). Oxford: Oxford University Press.

Schmitt, N., & Sinha, R. ( 2010 ). Validation support for selection procedures. In S. Zedeck (Ed.), APA handbook of industrial and organizational psychology, Vol 2: Selecting and developing members for the organization (pp. 399–420). Washington, DC: American Psychological Association.

Smith, D. B. , & Robie, C. ( 2004 ). The implications of impression management for personality research in organizations. In B. Schneider & D. B. Smith (Eds.), Personality and organizations (pp. 111–138). Mahwah, NJ: Lawrence Erlbaum Associates.

Society for Industrial and Organizational Psychology. ( 2003 ). Principles for the validation and use of personnel selection procedures (4th ed.). Bowling Green, OH: SIOP.

Steel, P. D. G., Huffcutt, A. I., & Kammeyer-Mueller, J. ( 2006 ). From the work one knows the worker: A systematic review of the challenges, solutions, and steps to creating synthetic validity.   International Journal of Selection and Assessment , 14 , 16–36.

Sussmann, M., & Robertson, D. U. ( 1986 ). The validity of validity: An analysis of validation study designs.   Journal of Applied Psychology , 71 , 461–468.

Tett, R. P., & Burnett, D. D. ( 2003 ). A personality trait-based interactionist model of job performance.   Journal of Applied Psychology , 88 , 500–517.

Thornton, G. C. III. ( 2009 ). Evidence of content matching is evidence of validity.   Industrial and Organizational Psychology: Perspectives on Science and Practice, 2, 469–474.

Tonowski, R. F. ( 2009 ). “ Content” still belongs with “validity. ” Industrial and Organizational Psychology: Perspectives on Science and Practice , 2 , 481–485.

Van Iddekinge, C. H., & Ployhart, R. E. ( 2008 ). Developments in the criterion-related validation of selection procedures: A critical review and recommendations for practice.   Personnel Psychology , 61 , 871–925.

Van Iddekinge, C. H., Putka, D. J., & Campbell, J. P. ( 2011 ). Reconsidering vocational interests for personnel selection: The validity of an interest-based selection test in relation to job knowledge, job performance, and continuance intentions.   Journal of Applied Psychology, 96, 13–33.

Vance, R., Coovert, M. D., MacCallum, R. C., & Hedge, J. W. ( 1989 ). Construct models of job performance.   Journal of Applied Psychology, 74, 447–455.

Weitz, J. ( 1961 ). Criteria for criteria.   American Psychologist , 16 , 228–231.

White, L. A., Young, M. C., Hunter, A. E., & Rumsey, M. G. ( 2008 ). Lessons learned in transitioning personality measures from research to operational settings.   Industrial and Organizational Psychology, 1, 291–295.

Whitener, E. M. ( 1990 ). Confusion of confidence intervals and credibility intervals in meta-analysis.   Journal of Applied Psychology , 75 , 315–321.

  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

A Step-by-Step Guide to Analytical Method Development and Validation

Method development and validation are essential components of drug development and chemistry manufacturing and controls (CMC). The goal of method development and validation is to  ensure that the methods used to measure the identity, purity, potency, and stability of drugs are accurate, precise, and reliable. Analytical methods are critical tools for ensuring the quality, safety, and efficacy of pharmaceutical products in the drug development process. Analytical development services performed at Emery Pharma are outlined below.

Analytical Method Development Overview:

Analytical method development is the process of selecting and optimizing analytical methods to measure a specific attribute of a drug substance or drug product. This process involves a systematic approach to evaluating and selecting suitable methods that are sensitive, specific, and robust, and can be used to measure the target attribute within acceptable limits of accuracy and precision.

Method Validation Overview:

Method validation is the process of demonstrating that an analytical method is suitable for its intended use, and that it is capable of producing reliable and consistent results over time. The validation process involves a set of procedures and tests designed to evaluate the performance characteristics of the method.

Components of method validation include:

  • Specificity
  • Limit of detection (LOD)
  • Limit of quantification (LOQ)

Depending on the attribute being assayed, we use state-of-the-art instrumentation such as HPLC (with UV-Vis/DAD, IR, CAD, etc. detectors), LC-MS, HRMS, MS/MS, GC-FID/MS, NMR, plate readers, etc.

At Emery Pharma, we follow a prescribed set of key steps per regulatory (FDA, EMA, etc.) guidance, as well as instructions from the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH) for any analytical method development and validation.

Step 1: Define the Analytical Method Objectives

The first step in analytical method development and validation is to define the analytical method objectives, including the attribute to be measured, the acceptance criteria, and the intended use of the method. This step involves understanding the critical quality attributes (CQAs) of the drug product or drug substance and selecting appropriate analytical methods to measure them.

For example, impurity profile of a drug substance assessment require suitable HPLC-based methods for small molecules, whereas host cell proteins (impurity-equivalent) for a Biologic drug substance require ligand binding assays (LBAs) such as ELISA for overview, and LC-HRMS-based analysis for thorough understanding.

Step 2: Conduct a Literature Review

Next, a literature review is conducted to identify existing methods and establish a baseline for the method development process. This step involves reviewing scientific literature, regulatory guidance, and industry standards to determine the current state of the art and identify potential methods that may be suitable for the intended purpose.

At Emery Pharma, we have worked on and have existing programs on virtually all type of drug modalities, thus we have access to many validated internal methods to tap into as well.

Step 3: Develop a Method Plan

The next step is to develop a method plan that outlines the methodology, instrumentation, and experimental design for method development and validation. The plan includes the selection of suitable reference standards, the establishment of performance characteristics, and the development of protocols for analytical method validation.

Step 4: Optimize the Method

Next, the analytical method is optimized to ensure that it is sensitive, specific, and robust. This step involves evaluating various parameters, such as sample preparation, column selection, detector selection, mobile phase composition, and gradient conditions, to optimize the method performance.

Step 5: Validate the Method

The critical next step is to validate the analytical method to ensure that it meets the performance characteristics established in the method plan. This step involves evaluating the method's accuracy, precision, specificity, linearity, range, LOD, LOQ, ruggedness, and robustness.

Depending on the stage of development, validation may be performed under Research and Development (R&D), however, most Regulatory submissions require method validation be conducted per 21 CFR Part 58 on Good Laboratory Practices (GLP). To that end, Emery Pharma has an in-house Quality Assurance department that ensures compliance and can play host to regulators/auditors.

Step 6: (Optional) Transfer the Method

In some instances, e.g., clinical trials with multiple international sites, the validated method may need to be transferred to another qualified laboratory. We routinely help our Clients get several parallel sites up to speed on new validated methods, and support with training analysts on the method, documenting the method transfer process, and conducting ongoing monitoring and maintenance of the method.

Step 7: Sample Analysis

The final step of an analytical method development Validation process is developing a protocol and initiate sample analysis.

At Emery Pharma, depending on the stage of development, sample analysis is conducted under R&D or in compliance with 21 CFR Part 210 and 211 for current Good Manufacturing Procedures (cGMP). We boast an impressive array of qualified instrumentation that can be deployed for cGMP sample analysis, which is overseen by our Quality Assurance Director for compliance and proper reporting.

Let us be a part of your success story

Emery Pharma has decades of experience in analytical method development and validation. We strive to implement procedures that help to ensure new drugs are manufactured to the highest quality standards and are safe and effective for patient use.

assignment process & validations

Emery Pharma

Request for proposal, let us be a part of your success story..

Do you have questions regarding a potential project? Or would you like to learn more about our services? Please reach out to a member of the Emery Pharma team via the contact form, and one of our experts will be in touch soon as possible. We look forward to working with you!

assignment process & validations

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Adv Simul (Lond)

Logo of advsim

Validation of educational assessments: a primer for simulation and beyond

David a. cook.

1 Mayo Clinic Online Learning, Mayo Clinic College of Medicine, Rochester, MN USA

2 Office of Applied Scholarship and Education Science, Mayo Clinic College of Medicine, Rochester, MN USA

3 Division of General Internal Medicine, Mayo Clinic College of Medicine, Mayo 17-W, 200 First Street SW, Rochester, MN 55905 USA

Rose Hatala

4 Department of Medicine, University of British Columbia, Vancouver, British Columbia Canada

Associated Data

Not applicable.

Simulation plays a vital role in health professions assessment. This review provides a primer on assessment validation for educators and education researchers. We focus on simulation-based assessment of health professionals, but the principles apply broadly to other assessment approaches and topics.

Key principles

Validation refers to the process of collecting validity evidence to evaluate the appropriateness of the interpretations, uses, and decisions based on assessment results. Contemporary frameworks view validity as a hypothesis, and validity evidence is collected to support or refute the validity hypothesis (i.e., that the proposed interpretations and decisions are defensible). In validation, the educator or researcher defines the proposed interpretations and decisions, identifies and prioritizes the most questionable assumptions in making these interpretations and decisions (the “interpretation-use argument”), empirically tests those assumptions using existing or newly-collected evidence, and then summarizes the evidence as a coherent “validity argument.” A framework proposed by Messick identifies potential evidence sources: content, response process, internal structure, relationships with other variables, and consequences. Another framework proposed by Kane identifies key inferences in generating useful interpretations: scoring, generalization, extrapolation, and implications/decision. We propose an eight-step approach to validation that applies to either framework: Define the construct and proposed interpretation, make explicit the intended decision(s), define the interpretation-use argument and prioritize needed validity evidence, identify candidate instruments and/or create/adapt a new instrument, appraise existing evidence and collect new evidence as needed, keep track of practical issues, formulate the validity argument, and make a judgment: does the evidence support the intended use?

Conclusions

Rigorous validation first prioritizes and then empirically evaluates key assumptions in the interpretation and use of assessment scores. Validation science would be improved by more explicit articulation and prioritization of the interpretation-use argument, greater use of formal validation frameworks, and more evidence informing the consequences and implications of assessment.

Good assessment is important; simulation can help

Educators, administrators, researchers, policymakers, and even the lay public recognize the importance of assessing health professionals. Trending topics such as competency-based education, milestones, and mastery learning hinge on accurate, timely, and meaningful assessment to provide essential information about performance. Assessment of professional competence increasingly extends beyond training into clinical practice, with ongoing debates regarding the requirements for initial and ongoing professional licensure and certification. Front-line educators and education researchers require defensible assessments of health professionals in clinical and nonclinical settings. Indeed, the need for good assessments has never been greater and will most likely continue to grow.

Although workplace-based assessment is essential [ 1 – 3 ], simulation does and will continue to play a vital role in health professions assessment, inasmuch as it permits the targeting of specific topics and skills in a safe environment [ 4 – 6 ]. The conditions of assessment can be standardized across learners, and the spectrum of disease, clinical contexts, and comorbidities can be manipulated to focus on, for example, common yet critical tasks, infrequently seen conditions, activities that might put patients at risk, or situations that provoke specific emotional responses [ 7 , 8 ]. Thus, it comes as no surprise that simulation-based assessment is increasingly common. A review published in 2013 identified over 400 studies evaluating simulation-based assessments [ 9 ], and that number has surely grown. However, that same review identified serious and frequent shortcomings in the evidence supporting these assessments, and in the research studies designed to collect such evidence (i.e., validation studies). The gap between the need for good simulation-based assessment and the deficiencies in the process and product of current validation efforts suggests the need for increased awareness of the current state of the science of validation.

The purpose of this article is to provide a primer on assessment validation for educators and education researchers. We focus on the context of simulation-based assessment of health professionals but believe the principles apply broadly to other assessment approaches and topics.

Validation is a process

Validation refers to the process of collecting validity evidence to evaluate the appropriateness of the interpretations, uses, and decisions based on assessment results [ 10 ]. This definition highlights several important points. First, validation is a process not an endpoint. Labeling an assessment as “validated” means only that the validation process has been applied—i.e., that evidence has been collected. It does not tell us what process was used, the direction or magnitude of the evidence (i.e., was it favorable or unfavorable and to what degree?), what gaps remain, or for what context (learner group, learning objectives, educational setting) the evidence is relevant.

Second, validation involves the collection of validity evidence, as we discuss in a following section.

Third, validation and validity ultimately refer to a specific interpretation or use of assessment data, be these numeric scores or narrative comments [ 11 ], and to the decisions grounded in this interpretation. We find it helpful to illustrate this point through analogy with diagnostic tests in clinical medicine [ 12 ]. A clinical test is only useful to the degree that (a) the test influences decisions, and (b) these decisions lead to meaningful changes in action or patient outcomes. Hence, physicians are often taught, “Don’t order the test if it won’t change patient management.” For example, the prostate-specific antigen (PSA) test has high reliability and is strongly associated with prostate cancer. However, this test is no longer widely recommended in screening for prostate cancer because it is frequently elevated when no cancer is present, because testing leads to unnecessary prostate biopsies and patient anxiety, and because treating cancers that are found often does not improve clinical outcomes (i.e., treatment is not needed). In other words, the negative/harmful consequences outweigh the beneficial consequences of testing (screening) in many patients [ 13 – 15 ]. However, PSA testing is still useful as a marker of disease once prostate cancer has been diagnosed and treated. Reflecting this example back to educational tests (assessments) and the importance of decisions: (1) if it will not change management the test should not be done, (2) a test that is useful for one objective or setting may be less useful in another context, and (3) the long-term and downstream consequences of testing must be considered in determining the overall usefulness of the test.

Why is assessment validation important?

Rigorous validation of educational assessments is critically important for at least two reasons. First, those using an assessment must be able to trust the results. Validation does not give a simple yes/no answer regarding trustworthiness (validity); rather, a judgment of trustworthiness or validity depends on the intended application and context and is typically a matter of degree. Validation provides the evidence to make such judgments and a critical appraisal of remaining gaps.

Second, the number of assessment instruments, tools, and activities is essentially infinite, since each new multiple-choice question, scale item, or exam station creates a de facto new instrument. Yet, for a given educator, the relevant tasks and constructs in need of assessment are finite. Each educator thus needs information to sort and sift among the myriad possibilities to identify the assessment solution that best meets his or her immediate needs. Potential solutions include selecting an existing instrument, adapting an existing instrument, combining elements of several instruments, or creating a novel instrument from scratch [ 16 ]. Educators need information regarding not only the trustworthiness of scores, but also the logistics and practical issues such as cost, acceptability, and feasibility that arise during test implementation and administration.

In addition, simulation-based assessments are almost by definition used as surrogates for a more “meaningful” clinical or educational outcome [ 17 ]. Rarely do we actually want to know how well learners perform in a simulated environment; usually, we want to know how they would perform in real life. A comprehensive approach to validation will include evaluating the degree to which assessment results extrapolate to different settings and outcomes [ 18 , 19 ].

What do we mean by validity evidence?

Classical validation frameworks identified at least three different “types” of validity: content , construct , and criterion ; see Table  1 . However, this perspective has been replaced by more nuanced yet unified and practical views of validity [ 10 , 12 , 20 ]. Contemporary frameworks view validity as a hypothesis, and just as a researcher would collect evidence to support or refute a research hypothesis, validity evidence is collected to support or refute the validity hypothesis (more commonly referred to as the validity argument). Just as one can never prove a hypothesis, validity can never be proven; but evidence can, as it accumulates, support or refute the validity argument.

The classical validity framework

a Some authors also include “face validity” as a fourth type of validity in the classical framework. However, face validity refers either to superficial appearances that have little merit in evaluating the defensibility of assessment [ 26 , 59 ] (like judging the speed of the car by its color) or to influential features that are better labeled content validity (like judging the speed of the car by its model or engine size). We discourage use of the term "face validity"

The first contemporary validity framework was proposed by Messick in 1989 [ 21 ] and adopted as a standard for the field in 1999 [ 22 ] and again in 2014 [ 23 ]. This framework proposes five sources of validity evidence [ 24 – 26 ] that overlap in part with the classical framework (see Table  2 ). Content evidence, which is essentially the same as the old concept of content validity, refers to the steps taken to ensure that assessment items (including scenarios, questions, and response options) reflect the construct they are intended to measure. Internal structure evidence evaluates the relationships of individual assessment items with each other and with the overarching construct(s), e.g., reliability, domain or factor structure, and item difficulty. Relationships with other variables evidence evaluates the associations, positive or negative and strong or weak, between assessment results and other measures or learner characteristics. This corresponds closely with classical notions of criterion validity and construct validity. Response process evidence evaluates how well the documented record (answer, rating, or free-text narrative) reflects the observed performance. Issues that might interfere with the quality of responses include poorly trained raters, low-quality video recordings, and cheating. Consequences evidence looks at the impact, beneficial or harmful, of the assessment itself and the decisions and actions that result [ 27 – 29 ]. Educators and researchers must identify the evidence most relevant to their assessment and corresponding decision, then collect and appraise this evidence to formulate a validity argument. Unfortunately, the “five sources of evidence” framework provides incomplete guidance in such prioritization or selection of evidence.

The five sources of evidence validity framework

See the following for further details and examples [ 20 , 25 , 26 ]

The most recent validity framework, from Kane [ 10 , 12 , 30 ], addresses the issue of prioritization by identifying four key inferences in an assessment activity (Table  3 ). For those accustomed to the classical or five-evidence-sources framework, Kane’s framework is often challenging at first because the terminology and concepts are entirely new. In fact, when learning this framework, we have found that it helps to not attempt to match concepts with those of earlier frameworks. Rather, we begin de novo by considering conceptually the stages involved in any assessment activity. An assessment starts with a performance of some kind, such as answering a multiple-choice test item, interviewing a real or standardized patient, or performing a procedural task. Based on this observation, a score or written narrative is documented that we assume reflects the level of performance; several scores or narratives are combined to generate an overall score or interpretation that we assume reflects the desired performance in a test setting; the performance in a test setting is assumed to reflect the desired performance in a real-life setting; and that performance is further assumed to constitute a rational basis for making a meaningful decision (see Fig.  1 ). Each of these assumptions represents an inference that might not actually be justifiable. The documentation of performance (scoring inference) could be inaccurate; the synthesis of individual scores might not accurately reflect performance across the desired test domains (generalization inference); the synthesized score also might not reflect real-life performance (extrapolation inference); and this performance (in a test setting or real life) might not form a proper foundation for the desired decision (implications or decision inference). Kane’s validity framework explicitly evaluates the justifications for each of these four inferences. We refer those wishing to learn more about Kane’s framework to his description [ 10 , 30 ] and to our recent synopsis of his work [ 12 ].

The validation inferences validity framework

See Kane [ 10 ] and Cook et al [ 12 ] for further details and examples

a Each of the inferences reflects assumptions about the creation and use of assessment results

An external file that holds a picture, illustration, etc.
Object name is 41077_2016_33_Fig1_HTML.jpg

Key inferences in validation

Educators and researchers often ask how much validity evidence is needed and how the evidence from a previous validation applies when an instrument is used in a new context. Unfortunately, the answers to these questions depend on several factors including the risk of making a wrong decision (i.e., the “stakes” of the assessment), the intended use, and the magnitude and salience of contextual differences. While all assessments should be important, some assessment decisions have more impact on a learner’s life than others. Assessments with higher impact or higher risk, including those used for research purposes, merit higher standards for the quantity, quality, and breadth of evidence. Strictly speaking, validity evidence applies only to the purpose, context, and learner group in which it was collected; existing evidence might guide our choice of assessment approach but does not support our future interpretations and use. Of course, in practice, we routinely consider existing evidence in constructing a validity argument. Whether old evidence applies to a new situation requires a critical appraisal of how situational differences might influence the relevance of the evidence. For example, some items on a checklist might be relevant across different tasks while others might be task-specific; reliability can vary substantially from one group to another, with typically lower values among more homogeneous learners; and differences in context (inpatient vs outpatient), learner level (junior medical student vs senior resident), and purpose might affect our interpretation of evidence of content, relations with other variables, or consequences. Evidence collected in contexts similar to ours and consistent findings across a variety of contexts will support our choice to include existing evidence in constructing our validity argument.

What do we mean by validity argument?

In addition to clarifying the four key inferences, Kane has advanced our understanding of “argument” in the validation process by emphasizing two distinct stages of argument: an up-front “interpretation-use argument” or “IUA,” and a final “validity argument.”

As noted above, all interpretations and uses—i.e., decisions—incur a number of assumptions. For example, in interpreting the scores from a virtual reality assessment, we might assume that the simulation task—including the visual representation, the simulator controls, and the task itself—has relevance to tasks of clinical significance; that the scoring algorithm accounts for important elements of that task; that there are enough tasks, and enough variety among tasks, to reliably gauge trainee performance; and that it is beneficial to require trainees to continue practicing until they achieve a target score. These and other assumptions can and must be tested! Many assumptions are implicit, and recognizing and explicitly stating them before collecting or examining the evidence is an essential step. Once we have specified the intended use, we need to (a) identify as many assumptions as possible, (b) prioritize the most worrisome or questionable assumptions, and (c) come up with a plan to collect evidence that will confirm or refute the correctness of each assumption. The resulting prioritized list of assumptions and desired evidence constitute the interpretation-use argument. Specifying the interpretation-use argument is analogous both conceptually and in importance to stating a research hypothesis and articulating the evidence required to empirically test that hypothesis.

Once the evaluation plan has been implemented and evidence has been collected, we synthesize the evidence, contrast these findings with what we anticipated in the original interpretation-use argument, identify strengths and weaknesses, and distill this into a final validity argument. Although the validity argument attempts to persuade others that the interpretations and uses are indeed defensible—or that important gaps remain—potential users should be able to arrive at their own conclusions regarding the sufficiency of the evidence and the accuracy of the bottom-line appraisal. Our work is similar to that of an attorney arguing a case before a jury: we strategically seek, organize, and interpret the evidence and present an honest, complete, and compelling argument, yet it is the “jury” of potential users that ultimately passes judgment on validity for their intended use and context. [ 31 ]

It is unlikely that any single study will gather all the validity evidence required to support a specific decision. Rather, different studies will usually address different aspects of the argument, and educators need to consider the totality of the evidence when choosing an assessment instrument for their context and needs.

Of course, it is not enough for researchers to simply collect any evidence. It is not just the quantity of evidence that matters, but also the relevance, quality, and breadth. Collecting abundant evidence of score reliability does not obviate the need for evidence about content, relationships, or consequences. Conversely, if existing evidence is robust and logically applicable to our context, such as a rigorous item development process, then replicating such efforts may not be top priority. Unfortunately, researchers often inadvertently fail to deliberately prioritize the importance of the assumptions or skip the interpretation-use argument altogether, which can result in reporting evidence for assumptions that are easy to test rather than those that are most critical.

A practical approach to validation

Although the above concepts are essential to understanding the process of validation, it is also important to be able to apply this process in practical ways. Table  4 outlines one possible approach to validation that would work with any of the validity frameworks described above (classical, Messick, or Kane). In this section, we will illustrate this approach using a hypothetical simulation-based example.

Imagine that we are teaching first year internal medicine residents lumbar puncture (LP) using a part-task trainer. At the end of the training session, we wish to assess whether the learners are ready to safely attempt an LP with a real patient under supervision.

Validation begins by considering the construct of interest. For example, are we interested in the learners’ knowledge of LP indications and risks, their ability to perform LP, or their non-technical skills when attempting an LP? Each of these is a different construct requiring selection of a different assessment tool: we might choose multiple-choice questions (MCQs) to assess knowledge, a series of skill stations using a part-task trainer to asses procedural skill with an Objective Structured Assessment of Technical Skills (OSATS) [ 32 ], or a resuscitation scenario using a high-fidelity manikin and a team of providers to assess non-technical skills with the Non-Technical Skills (NOTECHS) scale [ 33 ].

In our example, the construct is “LP skill” and the interpretation is that “learners have fundamental LP skills sufficient to attempt a supervised LP on a real patient.”

Without a clear idea of the decisions we anticipate making based on those interpretations, we will be unable to craft a coherent validity argument.

In our example, our foremost decision is whether the learner has sufficient procedural competence to attempt a supervised LP on a real patient. Other decisions we might alternatively consider include identifying performance points on which to offer feedback to the learner, deciding if the learner can be promoted to the next stage of training, or certifying the learner for licensure.

In making our interpretations and decisions, we will invoke a number of assumptions, and these must be tested. Identifying and prioritizing key assumptions and anticipating the evidence we hope to find allows us to outline an interpretation-use argument [ 30 ].

In our scenario, we are looking for an assessment instrument in which a “pass” indicates competence to attempt a supervised LP on a real patient. We anticipate that this will involve a physician rating student performance on a skills station. Assumptions in this context include that the station is set up to test techniques essential for LP performance (vs generic skills in sterile technique or instrument handling), that the rater is properly trained, that a different rater would give similar scores, and that learners who score higher on the test will perform more safely on their first patient attempt. Considering the evidence we might need to support or refute these assumptions, and using Kane’s framework as a guide, we propose an interpretation-use argument as follows. We do not know at this stage whether evidence has already been collected or if we will need to collect it ourselves, but we have at least identified what to look for.
  • Scoring: the observation of performance is correctly transformed into a consistent numeric score. Evidence will ideally show that the items within the instrument are relevant to LP performance, that raters understood how to use the instrument, and that video-recording performance yields similar scores as direct observation.
  • Generalization: scores on a single performance align with overall scores in the test setting. Evidence will ideally show that we have adequately sampled performance (sufficient number of simulated LPs, and sufficient variety of conditions such as varying the simulated patient habitus) and that scores are reproducible between performances and between raters (inter-station and inter-rater reliability).
  • Extrapolation: assessment scores relate to real-world performance. Evidence will ideally show that scores from the instrument correlate with other LP performance measures in real practice, such as procedural logs, patient adverse events, or supervisor ratings.
  • Implications: the assessment has important and favorable effects on learners, training programs, or patients, and negative effects are minimal. Evidence will ideally show that students feel more prepared following the assessment, that those requiring remediation feel this time was well spent, and that LP complications in real patients decline in the year following implementation.

We cannot over-emphasize the importance of these first three steps in validation. Clearly articulating the proposed interpretations, intended decision(s), and assumptions and corresponding evidence collectively set the stage for everything that follows.

We should identify a measurement format that aligns conceptually with our target construct and then search for existing instruments that meet or could be adapted to our needs. A rigorous search provides content evidence to support our final assessment. Only if we cannot find an appropriate existing instrument would we develop an instrument de novo.

We find a description of a checklist for assessing PGY-1’s procedural competence in LP [ 34 ]. The checklist appears well suited for our purpose, as we will be using it in a similar educational context; we thus proceed to appraising the evidence without changing the instrument.

Although existing evidence does not, strictly speaking apply to our situation, for practical purposes we will rely heavily on existing evidence as we decide whether to use this instrument. Of course, we will want to collect our own evidence as well, but we must base our initial adoption on what is now available.

We begin our appraisal of the validity argument by searching for existing evidence. The original description [ 34 ] offers scoring evidence by describing the development of checklist items through formal LP task analysis and expert consensus. It provides generalization evidence by showing good inter-rater reliability, and adds limited extrapolation evidence by confirming that residents with more experience had higher checklist scores. Other studies using the same or a slightly modified checklist provide further evidence for generalization with good inter-rater reliabilities [ 35 , 36 ], and contribute extrapolation evidence by showing that scores are higher after training [ 35 , 37 ] and that the instrument identified important learner errors when used to rate real patient LPs [ 38 ]. One study also provided limited implications evidence by counting the number of practice attempts required to attain competence in the simulation setting [ 37 ]. In light of these existing studies, we will not plan to collect more evidence before our initial adoption of this instrument. However, we will collect our own evidence during implementation, especially if we identify important gaps, i.e., at later stages in the validation process; see below.

An important yet often poorly appreciated and under-studied aspect of validation concerns the practical issues surrounding development, implementation, and interpretation of scores. An assessment procedure might yield outstanding data, but if it is prohibitively expensive or if logistical or expertise requirements exceed local resources, it may be impossible to implement.

For the LP instrument, one study [ 37 ] tracked the costs of running a simulation-based LP training and assessment session; the authors suggested that costs could be reduced by using trained non-physician raters. As we implement the instrument, and especially if we collect fresh validity evidence, we should likewise monitor costs such as money, human and non-human resources, and other practical issues.

We now compare the evidence available (the validity argument) against the evidence we identified up-front as necessary to support the desired interpretations and decisions (the interpretation-use argument).

We find reasonable scoring and generalization evidence, a gap in the extrapolation evidence (direct comparisons between simulation and real-world performance have not been done), and limited implications evidence. As is nearly always the case, the match between the interpretation-use argument and the available evidence is not perfect; some gaps remain, and some of the evidence is not as favorable as we might wish.

The final step in validation is to judge the sufficiency and suitability of evidence, i.e., whether the validity argument and the associated evidence meet the demands of the proposed interpretation-use argument.

Based on the evidence summarized above, we judge that the validity argument supports those interpretations and uses reasonably well, and the checklist appears suitable for our purposes. Moreover, the costs seem reasonable for the effort expended, and we have access to an assistant in the simulation laboratory who is keen to be trained as a rater.
We also plan to help resolve the evidence gaps noted above by conducting a research study as we implement the instrument at our institution. To buttress the extrapolation inference we plan to correlate scores from the simulation assessment with ongoing workplace-based LP assessments. We will also address the implications inference by tracking the effects of additional training for poor performing residents, i.e., the downstream consequences of assessment. Finally, we will measure the inter-rater, inter-case, and internal consistency reliability in our learner population, and will monitor costs and practical issues as noted above.

Application of the same instrument to a different setting

As a thought exercise, let us consider how the above would unfold if we wanted to use the same instrument for a different purpose and decision, for example as part of a high-stakes exam to certify postgraduate neurologist trainees as they finish residency. As our decision changes, so does our interpretation-use argument; we would now be searching for evidence that a “pass” score on the checklist indicates competence to independently perform LPs on a variety of real patients. We would require different or additional validity evidence, with increased emphasis on generalization (sampling across simulated patients that vary in age, body habitus, and other factors that influence difficulty), extrapolation (looking for stronger correlation between simulation and real-life performance), and implications evidence (e.g., evidence that we were accurately classifying learners as competent or incompetent for independent practice). We would have to conclude that the current body of evidence does not support this argument and would need to either (a) find a new instrument with evidence that meets our demands, (b) create a new instrument and start collecting evidence from scratch, or (c) collect additional validity evidence to fill in the gaps.

This thought exercise highlights two important points. First, the interpretation-use argument might change when the decision changes. Second, an instrument is not “valid” in and of itself; rather, it is the interpretations or decisions that are validated. A final judgment of validity based on the same evidence may differ for different proposed decisions.

Common mistakes to avoid in validation

In our own validation efforts [ 39 – 41 ] and in reviewing the work of others [ 9 , 25 , 42 ], we have identified several common mistakes that undermine the end-user’s ability to understand and apply the results. We present these as ten mistakes guaranteed to alarm peer reviewers, frustrate readers, and limit the uptake of an instrument.

Mistake 1. Reinvent the wheel (create a new assessment every time)

Our review [ 9 ] found that the vast majority of validity studies focused on a newly created instrument rather than using or adapting an existing instrument. Yet, there is rarely a need to start completely from scratch when initiating learner assessment, as instruments to assess most constructs already exist in some form. Using or building from an existing instrument saves the trouble of developing an instrument de novo, allows us to compare our results with prior work, and permits others to compare their work with ours and include our evidence in the overall evidence base for that instrument, task, or assessment modality. Reviews of evidence for the OSATS [ 42 ], Fundamentals of Laparoscopic Surgery (FLS) [ 43 ], and other simulation-based assessments [ 9 ] all show important gaps in the evidence base. Filling these gaps will require the collaborative effort of multiple investigators all focused on collecting evidence for the scores, inferences, and decisions derived from the same assessment.

Mistake 2. Fail to use a validation framework

As noted above, validation frameworks add rigor to the selection and collection of evidence and help identify gaps that might otherwise be missed. More important than the framework chosen is the timing (ideally early) and manner (rigorously and completely) in which the framework is applied in the validation effort.

Mistake 3. Make expert-novice comparisons the crux of the validity argument

Comparing the scores from a less experienced group against those from a more experienced group (e.g., medical students vs senior residents) is a common approach to collecting evidence of relationships with other variables—reported in 73% of studies of simulation-based assessment [ 9 ]. Yet this approach provides only weak evidence because the difference in scores may arise from a myriad of factors unrelated to the intended construct [ 44 ]. To take an extreme example for illustration, suppose an assessment intended to measure suturing ability actually measured sterile technique and completely ignored suturing. If an investigator trialed this in practice among third-year medical students and attending physicians, he would most likely find a significant difference favoring the attendings and might erroneously conclude that this evidence supports the validity of the proposed interpretation (i.e., suturing skill). Of course, in this hypothetical example, we know that attendings are better than medical students in both suturing and sterile technique. Yet, in real life, we lack the omniscient knowledge of what is actually being assessed; we only know the test scores—and the same scores can be interpreted as reflecting any number of underlying constructs. This problem of “confounding” (multiple possible interpretations) makes it impossible to say that any differences between groups are actually linked to the intended construct. On the other hand, failure to confirm expected differences would constitute powerful evidence of score invalidity.

Cook provided an extended discussion and illustration of this problem, concluding that “It is not wrong to perform such analyses, … provided researchers understand the limitations. … These analyses will be most interesting if they fail to discriminate groups that should be different, or find differences where none should exist. Confirmation of hypothesized differences or similarities adds little to the validity argument.” [ 44 ]

Mistake 4. Focus on the easily accessible validity evidence rather than the most important

Validation researchers often focus on data they have readily available or can easily collect. While this approach is understandable, it often results in abundant validity evidence being reported for one source while large evidence gaps remain for other sources that might be equally or more important. Examples include emphasizing content evidence while neglecting internal structure, reporting inter-item reliability when inter-rater reliability is more important, or reporting expert-novice comparisons rather than correlations with an independent measure to support relationships with other variables. In our review, we found that 306/417 (73%) of studies reported expert-novice comparisons, and 138 of these (45%) reported no additional evidence. By contrast, only 128 (31%) reported relationships with a separate measure, 142 (34%) reported content evidence, and 163 (39%) reported score reliability. While we do not know all the reasons for these reporting patterns, we suspect they are due at least in part to the ease with which some elements (e.g., expert-novice comparison data) can be obtained.

This underscores the importance of clearly and completely stating the interpretation-use argument, identifying existing evidence and gaps, and tailoring the collection of evidence to address the most important gaps.

Mistake 5. Focus on the instrument rather than score interpretations and uses

As noted above, validity is a property of scores, interpretations, and uses, not of instruments. The same instrument can be applied to different uses (the PSA may not be useful as a clinical screening tool, but continues to have value for monitoring prostate cancer recurrence), and much validity evidence is context-dependent. For example, score reliability can change substantially across different populations [ 44 ], an assessment designed for one learning context such as ambulatory practice may or may not be relevant in another context such as hospital or acute care medicine, and some instruments such as the OSATS global rating scale lend themselves readily to application to a new task while others such as the OSATS checklist do not [ 42 ]. Of course, evidence collected in one context, such as medical school, often has at least partial relevance to another context, such as residency training; but determinations of when and to what degree evidence transfers to a new setting are a matter of judgment, and these judgments are potentially fallible.

The interpretation-use argument cannot, strictly speaking, be appropriately made without articulating the context of intended application. Since the researcher’s context and the end-user’s context almost always differ, the interpretation-use argument necessarily differs as well. Researchers can facilitate subsequent uptake of their work by clearly specifying the context of data collection—for example, the learner group, task, and intended use/decision—and also by proposing the scope to which they believe their findings might plausibly apply.

It is acceptable to talk about the validity of scores, but for reasons articulated above, it is better to specify the intended interpretation and use of those scores, i.e., the intended decision. We strongly encourage both researchers and end-users (educators) to articulate the interpretations and uses at every stage of validation.

Mistake 6. Fail to synthesize or critique the validity evidence

We have often observed researchers merely report the evidence without any attempt at synthesis and appraisal. Both educators and future investigators greatly benefit when researchers interpret their findings in light of the proposed interpretation-use argument, integrate it with prior work to create a current and comprehensive validity argument, and identify shortcomings and persistent gaps or inconsistencies. Educators and other end-users must become familiar with the evidence as well, to confirm the claims of researchers and to formulate their own judgments of validity for their specific context.

Mistake 7. Ignore best practices for assessment development

Volumes have been written on the development, refinement, and implementation of assessment tasks, instruments, and procedures [ 23 , 45 – 48 ]. Developing or modifying an assessment without considering these best practices would be imprudent. We could not begin to summarize these, but we highlight two recommendations of particular salience to health professions educators, both of which relate to content evidence (per the classic or five sources frameworks) and the generalization inference (per Kane).

First, the sample of tasks or topics should represent the desired performance domain. A recurrent finding in health professions assessment is that there are few, if any, generalizable skills; performance on one task does not predict performance on another task [ 49 , 50 ]. Thus, the assessment must provide a sufficiently numerous and broad sample of scenarios, cases, tasks, stations, etc.

Second, the assessment response format should balance objectification and judgment or subjectivity [ 51 ]. The advantages and disadvantages of checklists and global ratings have long been debated, and it turns out that both have strengths and weaknesses [ 52 ]. Checklists outline specific criteria for desired behaviors and guidance for formative feedback, and as such can often be used by raters less familiar with the assessment task. However, the “objectivity” of checklists is largely an illusion; [ 53 ] correct interpretation of an observed behavior may yet require task-relevant expertise, and forcing raters to dichotomize ratings may result in a loss of information. Moreover, a new checklist must be created for each specific task, and the items often reward thoroughness at the expense of actions that might more accurately reflect clinical competence. By contrast, global ratings require greater expertise to use but can measure more subtle nuances of performance and reflect multiple complementary perspectives. Global ratings can also be designed for use across multiple tasks, as is the case for the OSATS. In a recent systematic review, we found slightly higher inter-rater reliability for checklists than for global ratings when averaged across studies, while global ratings had higher average inter-item and inter-station reliability [ 52 ]. Qualitative assessment offers another option for assessing some learner attributes [ 11 , 54 , 55 ].

Mistake 8. Omit details about the instrument

It is frustrating to identify an assessment with relevance to local needs and validity evidence supporting intended uses, only to find that the assessment is not specified with sufficient detail to permit application. Important omissions include the precise wording of instrument items, the scoring rubric, instructions provided to either learners or raters, and a description of station arrangements (e.g., materials required in a procedural task, participant training in a standardized patient encounter) and the sequence of events. Most researchers want others to use their creations and cite their publications; this is far more likely to occur if needed details are reported. Online appendices provide an alternative to print publication if article length is a problem.

Mistake 9. Let the availability of the simulator/assessment instrument drive the assessment

Too often as educators, we allow the availability of an assessment tool to drive the assessment process, such as taking an off-the-shelf MCQ exam for an end-of-clerkship assessment when a performance-based assessment might better align with clerkship objectives. This issue is further complicated with simulation-based assessments, where the availability of a simulator may drive the educational program as opposed to designing the educational program and then choosing the best simulation to fit the educational needs [ 56 ]. We should align the construct we are teaching with the simulator and assessment tool that best assess that construct.

Mistake 10. Label an instrument as validated

There are three problems with labeling an instrument as validated. First, validity is a property of scores, interpretations, and decisions, not instruments. Second, validity is a matter of degree—not a yes or no decision. Third, validation is a process, not an endpoint. The word validated means only that a process has been applied; it does not provide any details about that process nor indicate the magnitude or direction (supportive or opposing) of the empiric findings.

The future of simulation-based assessment

Although we do not pretend to know the future of simulation-based assessment, we conclude with six aspirational developments we hope come to pass.

  • We hope to see greater use of simulation-based assessment as part of a suite of learner assessments. Simulation-based assessment should not be a goal in and of itself, but we anticipate more frequent assessment in general and believe that simulation will play a vital role. The choice of modality should first consider what is the best assessment approach in a given situation, i.e., learning objective, learner level, or educational context. Simulation in its various forms will often be the answer, especially in skill assessments requiring standardization of conditions and content.
  • We hope that simulation-based assessment will focus more clearly on educational needs and less on technology. Expensive manikins and virtual reality task trainers may play a role, but pigs feet, Penrose drains, wooden pegs, and cardboard manikins may actually offer more practical utility because they can be used with greater frequency and with fewer constraints. For example, such low-cost models can be used at home or on the wards rather than in a dedicated simulation center. As we consider the need for high-value, cost-conscious education [ 57 ], we encourage innovative educators to actively seek low-tech solutions.
  • Building off the first two points, we hope to see less expensive, less sophisticated, less intrusive, lower-stakes assessments take place more often in a greater variety of contexts, both simulated and in the workplace. As Schuwirth and van der Vleuten have proposed [ 58 ], this model would—over time—paint a more complete picture of the learner than any single assessment, no matter how well-designed, could likely achieve.
  • We hope to see fewer new assessment instruments created and more evidence collected to support and adapt existing instruments. While we appreciate the forces that might incentivize the creation of novel instruments, we believe that the field will advance farther and faster if researchers pool their efforts to extend the validity evidence for a smaller subset of promising instruments, evaluating such instruments in different contexts, and successively filling in evidence gaps.
  • We hope to see more evidence informing the consequences and implications of assessment. This is probably the most important evidence source, yet it is among the least often studied. Suggestions for the study of the consequences of assessment have recently been published [ 27 ].
  • Finally, we hope to see more frequent and more explicit use of the interpretation-use argument. As noted above, this initial step is difficult but vitally important to meaningful validation.

Acknowledgements

Availability of data and materials, authors’ contributions.

Authors DAC and RH jointly conceived this work. DAC drafted the initial manuscript, and both authors revised the manuscript for important intellectual content and approved the final version.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Ethics approval and consent to participate, contributor information.

David A. Cook, Phone: 507-266-4156, Email: [email protected] .

Rose Hatala, Email: moc.cam@alatahr .

assignment process & validations

On Target Work Skills

Training trainers since 1986

How to conduct assessment validation (Part 1)

assignment process & validations

Introduction to assessment validation

Validation is defined as the quality review of the assessment process. It involves checking that the assessment tool produces valid, reliable, sufficient, current and authentic evidence to enable reasonable judgements to be made as to whether the requirements of the training package or VET accredited courses are met. It includes reviewing a statistically valid sample of the assessments and making recommendations for future improvements to the assessment tool, process and/or outcomes and acting upon such recommendations. [1]

Assessment validation has two distinct parts:

  • Part 1. Check the assessment tool for compliance
  • Part 2. Review a sample of the assessments.

This article covers the first part only.

If you want to know more about the second part, then I recommend reading the information published by ASQA about how to conduct assessment validation. This information covers: [2]

  • Who conducts validation?
  • Scheduling validation
  • Statistically valid sampling and randomly selecting samples to be validated
  • Reviewing assessment practice
  • Reviewing assessment judgements
  • Validation outcomes and the implementation of recommendations for improvement.

The assessment tool must be checked to ensure it complies with the requirements specified by the Standards for RTOs, in particular: [3]

  • Compliance with the principles of assessment and the rules of evidence
  • Compliance with the requirements specified by the training package or VET accredited course.

The following 6-step process can be used to check the assessment tool for compliance:

Step 1. Read the assessment requirements

Step 2. review the assessment plan, step 3. review the assessment matrix (mapping), step 4. check the details about how the knowledge evidence is planned to be being gathered, step 5. check the details about how the performance evidence is planned to be being gathered.

  • Step 6. Check the overall quality of the assessment tool.

This is a quick step to perform. You will read and re-read the unit of competency and its assessment requirements many times during the assessment validation process. During this first step, have a quick read of the assessment requirements and answer the following questions:

  • What is the volume or frequency of performance evidence?
  • Is the location, facilities, equipment, or other assessment conditions specified?

This step should also be quick. The purpose of this step is to get an overview of what is the planned assessment approach During this second step, answer the following questions:

  • Has the correct unit code and title been used?
  • How many assessment tasks are planned?
  • Is there a plan to gather the knowledge evidence?
  • Does there appear to be sufficient assessment tasks for gathering the volume or frequency of performance evidence?
  • Does the planned assessment approach seem to be simple or complex?

Note: This planned assessment approach may be found in the Training and Assessment Strategy (TAS) or other documents covering how the RTO plans to implement the delivery of the training and assessment for a unit or cluster of units.

This step should be a relatively quick step. The assessment matrix is an important document used to display how the RTO plans to gather evidence that comply with the requirements specified by the training package or VET accredited course. The assessment matrix will be used during Step 4 and Step 5 to cross-check the RTO’s planned assessment approach and the assessment instruments being used to gather evidence.

During this third step, answer the following questions:

  • Has the entire unit of competency and its assessment requirements been copied into the matrix? Are the number of items the same? For example, if the unit has five elements does the matrix have five elements? And scan the wording to ensure the matrix has the exact words as the unit of competency and its assessment requirements.
  • Is there one column for each planned assessment task?
  • Are the titles or descriptions of the assessment tasks the same in the assessment plan and assessment matrix?
  • Is every item from the unit of competency and its assessment requirements planned to be assessed? For example, is there at least one ‘tick’ in every row?

Note: Some assessment matrices will provide information or numerical indicator about the assessment item instead of using a ‘tick’. For example, the matrix may indicate that a piece of knowledge evidence will be gather by Question 1.

This step requires an attention to details. The purpose is to ensure that the assessment tool will gather the required knowledge evidence. During this fourth step, answer the following questions:

  • Is there an assessment instrument for gathering the knowledge evidence?
  • Are the instructions to the assessor clear and concise?
  • Are the instructions to the candidate clear and concise?
  • Is the structure, format, and layout of the assessment instrument easy to follow? This includes headings, sub-headings, page numbers, and numbering of questions.
  • Is there consistency between the assessment plan, assessment matrix and assessment instrument? For example, if the assessment plan states that there are 17 questions, does the assessment instrument have 17 questions?
  • Is every item of knowledge evidence being adequately gathered? A judgement about ‘adequately’ will need to be made.

This step requires an attention to details and it can take time to examine the assessment documents for compliance. The purpose is to ensure that the assessment tool will gather the required performance evidence. During this fifth step, answer the following questions:

  • Is there one or more assessment instruments for gathering the performance evidence?
  • Are the assessment conditions compliant with those stated in the Assessment Requirements for the unit of competency? This may include assessment location, facilities, equipment, and access to specified documents. For example, if the assessment conditions state that the assessment occurs in the workplace, then the assessment tasks must state that the evidence must be gathered from a workplace (not from a simulated workplace).
  • Are the items of performance evidence clearly listed or identified?
  • Is the structure, format, and layout of the assessment instrument or instruments easy to follow? This includes headings, sub-headings, and page numbers.
  • Is there consistency between the assessment plan, assessment matrix and assessment instrument? For example, if the assessment matrix states that evidence for Performance Criteria 1.1 will be gather during Assessment Task 2, then Assessment Task 2 must cover the gathering of evidence for Performance Criteria 1.1.
  • Is every item of performance evidence being adequately gathered? A judgement about ‘adequately’ will need to be made. This includes a check that the amount of evidence being gathered is compliant with the specified volume or frequency of performance evidence.

Note: Verbs are important. For example, if performance criteria says, ‘negotiate and agree with a supervisor’, then there needs to be evidence that the candidate has negotiated and agreed with a supervisor’. Also, the letter ‘s’ is important. A item of performance evidence may specify plural rather than singular. For example, if it states ‘write reports’, then more than one written report is required for evidence.

Step 6. Check the overall quality of the assessment tool

This step can take time to examine the assessment tool for compliance, readability, and usability.

  • Are there sample answers and assessment decision criteria for assessors?
  • Is the structure, format, and layout of all assessment documents easy to follow?
  • Are all instructions written clearly and concisely?
  • Are there any grammar, spelling and typo errors?
  • Is there a list of all the assessment documents required for the assessor?
  • Does the assessment tool have all the documents required for the assessor?
  • Is there a list of all the assessment documents required for the candidate?
  • Does the assessment tool have all the documents required for the candidate?
  • Has the correct unit code and title been used throughout all the assessment documents? This may include release number.
  • Do all the assessment documents have version control information?

In conclusion

  • Part 2. Review a sample of the assessments

Assessment validation can be time-consuming and mind-bending.

Preparation before an assessment validation meeting can reduce the time at the assessment validation meeting. However, you can expect a typical assessment validation meeting to require anywhere between a few hours and an entire day. The duration of the assessment validation meeting can depend on the quality of the assessment tool and number of assessment samples to be reviewed. I regularly see poor quality assessment tools, and it takes time to properly check large numbers of assessment samples.

Clear and critical thinking is required by people participating in an assessment validation meeting. There are usually many documents to be reviewed and checked. Printing paper copies of documents (or some documents) and using ‘split screens’ on computers will help when comparing information from two or more documents, such as:

  • unit of competency
  • assessment requirements
  • assessment plan
  • assessment matrix
  • assessment instructions
  • assessment instruments.

Frustration and fatigue can be experienced during long assessment validation meetings. Breaks will be needed (and sometimes chocolate helps). It is a good idea to assign an experienced VET practitioner to lead the assessment validation meeting.

[1] https://www.asqa.gov.au/standards/appendices/glossary accessed 2 September 2021

[2] https://www.asqa.gov.au/resources/fact-sheets/conducting-validation accessed 2 September 2021

[3] https://www.asqa.gov.au/standards/training-assessment/clauses-1.8-to-1.12 accessed 2 September 2021

Do you need help with your TAE studies?

Are you doing the TAE40116 or TAE40122 Certificate IV in Training and Assessment , and are you struggling with your studies? Do you want help with your TAE40116 or TAE40122 studies?

assignment process & validations

Ring Alan Maguire on 0493 065 396 to discuss.

Contact now!

logo otws

Share this:

' src=

Author: Alan Maguire

35+ years experience as a trainer, instructional designer, quality manager, project manager, program manager, RTO auditor, RTO manager and VET adviser. View all posts by Alan Maguire

4 thoughts on “How to conduct assessment validation (Part 1)”

Greetings Alan, A very professional and logical progression of the vexed issue of undertaking assessment. Congratulations. Well done. Derek Bailey

Like Liked by 1 person

Great read, I personally validated everything from the inception of an assessment tool and instrument to the usage and end result. Have proper polices and procedures, forms to back it up and a follow through that everyone can work with (not complicated) and easy to understand and apply.

I validated everything, the quota thing is problematic. I validated pre – when the assessment tool and instrument were in use, during the process of usage and after use. Use a reliable industry contact and review team.

  • Pingback: What is pre-assessment validation? – On Target Work Skills

Leave a comment Cancel reply

' src=

  • Already have a WordPress.com account? Log in now.
  • Subscribe Subscribed
  • Copy shortlink
  • Report this content
  • View post in Reader
  • Manage subscriptions
  • Collapse this bar
  • Marketing Cloud

Experiences

Access Trailhead, your Trailblazer profile, community, learning, original series, events, support, and more.

Search Tips:

  • Please consider misspellings
  • Try different search keywords

Triggers and Order of Execution

When you save a record with an insert , update , or upsert statement, Salesforce performs a sequence of events in a certain order.

Before Salesforce executes these events on the server, the browser runs JavaScript validation if the record contains any dependent picklist fields. The validation limits each dependent picklist field to its available values. No other validation occurs on the client side.

For a diagrammatic representation of the order of execution, see Order of Execution Overview on the Salesforce Architects site. The diagram is specific to the API version indicated on it, and can be out-of-sync with the information here. This Apex Developer Guide page contains the most up-to-date information on the order of execution for this API version. To access a different API version, use the version picker for the Apex Developer Guide .

On the server, Salesforce performs events in this sequence.

  • Loads the original record from the database or initializes the record for an upsert statement.

Salesforce performs different validation checks depending on the type of request.

  • Compliance with layout-specific rules
  • Required values at the layout level and field-definition level
  • Valid field formats
  • Maximum field length

Additionally, if the request is from a User object on a standard UI edit page, Salesforce runs custom validation rules.

  • For requests from multiline item creation such as quote line items and opportunity line items, Salesforce runs custom validation rules.
  • For requests from other sources such as an Apex application or a SOAP API call, Salesforce validates only the foreign keys and restricted picklists. Before executing a trigger, Salesforce verifies that any custom foreign keys don’t refer to the object itself.
  • Executes record-triggered flows that are configured to run before the record is saved.
  • Executes all before triggers.
  • Runs most system validation steps again, such as verifying that all required fields have a non- null value, and runs any custom validation rules. The only system validation that Salesforce doesn't run a second time (when the request comes from a standard UI edit page) is the enforcement of layout-specific rules.
  • Executes duplicate rules. If the duplicate rule identifies the record as a duplicate and uses the block action, the record isn’t saved and no further steps, such as after triggers and workflow rules, are taken.
  • Saves the record to the database, but doesn't commit yet.
  • Executes all after triggers.
  • Executes assignment rules.
  • Executes auto-response rules.

This sequence applies only to workflow rules.

  • Updates the record again.
  • Runs system validations again. Custom validation rules, flows, duplicate rules, processes, and escalation rules aren’t run again.
  • Executes before update triggers and after update triggers, regardless of the record operation (insert or update), one more time (and only one more time)
  • Executes escalation rules.
  • Flows launched by processes
  • Flows launched by workflow rules (flow trigger workflow actions pilot)

When a process or flow executes a DML operation, the affected record goes through the save procedure.

  • Executes record-triggered flows that are configured to run after the record is saved
  • Executes entitlement rules.
  • If the record contains a roll-up summary field or is part of a cross-object workflow, performs calculations and updates the roll-up summary field in the parent record. Parent record goes through save procedure.
  • If the parent record is updated, and a grandparent record contains a roll-up summary field or is part of a cross-object workflow, performs calculations and updates the roll-up summary field in the grandparent record. Grandparent record goes through save procedure.
  • Executes Criteria Based Sharing evaluation.
  • Commits all DML operations to the database.
  • Sending email
  • Enqueued asynchronous Apex jobs, including queueable jobs and future methods
  • Asynchronous paths in record-triggered flows

During a recursive save, Salesforce skips steps 9 (assignment rules) through 17 (roll-up summary field in the grandparent record).

Additional Considerations

Note these considerations when working with triggers.

  • If a workflow rule field update is triggered by a record update, Trigger.old doesn’t hold the newly updated field by the workflow after the update. Instead, Trigger.old holds the object before the initial record update was made. For example, an existing record has a number field with an initial value of 1. A user updates this field to 10, and a workflow rule field update fires and increments it to 11. In the update trigger that fires after the workflow field update, the field value of the object obtained from Trigger.old is the original value of 1, and not 10. See Trigger.old values before and after update triggers.
  • If a DML call is made with partial success allowed, triggers are fired during the first attempt and are fired again during subsequent attempts. Because these trigger invocations are part of the same transaction, static class variables that are accessed by the trigger aren't reset. See Bulk DML Exception Handling .
  • If more than one trigger is defined on an object for the same event, the order of trigger execution isn't guaranteed. For example, if you have two before insert triggers for Case and a new Case record is inserted. The firing order of these two triggers isn’t guaranteed.
  • To learn about the order of execution when you insert a non-private contact in your org that associates a contact to multiple accounts, see AccountContactRelation .
  • To learn about the order of execution when you’re using before triggers to set Stage and Forecast Category , see Opportunity .
  • In API version 53.0 and earlier, after-save record-triggered flows run after entitlements are executed.
  • Salesforce Help : Triggers for Autolaunched Flows

Comparing Default Account Assignment, Validation, and Substitution rules

After completing this lesson, you will be able to:

  • Differentiate between default account assignments, validation, and substitution rules

Tool Comparison

Having discussed default account assignment, validation, and substitution rules, we can make a quick comparison. Although distinct in their role and function, these three tools work together to ensure the accuracy and efficiency of fiscal operations.

assignment process & validations

All tools help to streamline accounting processes, however, their goals and use cases differ based on the level of complexity required and the specific operational rules of the business relating to cost accounting:

  • Default account assignment proposes default values when using specific accounts. It's utilized for automatically directing certain routine transactions to specific account assignment objects, such as posting office expenses to cost centers.
  • Validation is used as an automated checking tool ensuring financial data meets predefined conditions. If a transaction violates a validation rule – for example, when a false cost center is entered – the system outputs an error or warning message.
  • Substitution uses rules to complete or replace specified input values automatically with other values based on defined conditions. This helps in performing more complex assignments, reducing manual errors, and ensuring data adheres to certain standards.

In a nutshell, while they all aim to enhance the accuracy of financial data, default account assignment simplifies the process by automatic assignment, validation ensures compliance with set criteria, and substitution maintains data standards by auto-replacing certain inputs.

Log in to track your progress & complete quizzes

Salesforcean

Tuesday, october 31, 2017, assignment process & validations, managing assignment rules.

  • Lead Assignment Rules —Specify how leads are assigned to users or queues as they are created manually, captured from the web, or imported via the Data Import Wizard.
  • Case Assignment Rules —Determine how cases are assigned to users or put into queues as they are created manually, using Web-to-Case, Email-to-Case, On-Demand Email-to-Case, the Self-Service portal, the Customer Portal, Outlook, or Lotus Notes.

Sample Assignment Rule

Validation rules.

  • The user chooses to create a new record or edit an existing record.
  • The user clicks  Save .
  • If all data is valid, the record is saved.
  • If any data is invalid, the associated error message displays without saving the record.
  • The user makes the necessary changes and clicks  Save  again.

assignment process & validations

  • Managing Validation Rules
  • Define Validation Rules Validation rules verify that the data a user enters in a record meets the standards you specify before the user can save the record. A validation rule can contain a formula or expression that evaluates the data in one or more fields and returns a value of “True” or “False”. Validation rules also include an error message to display to the user when the rule returns a value of “True” due to an invalid value.
  • Clone Validation Rules
  • Activate Validation Rules
  • Validation Rules Fields
  • Tips for Writing Validation Rules
  • Validation Rule Considerations Validation rules verify that the data a user enters in a record meets the standards you specify before the user can save the record. A validation rule can contain a formula or expression that evaluates the data in one or more fields and returns a value of “True” or “False”. Validation rules also include an error message to display to the user when the rule returns a value of “True” due to an invalid value. Review these considerations before implementing validation rules in your organization.

No comments:

Post a comment, lightning inter-component communication patterns.

Lightning Inter-Component Communication Patterns If you’re comfortable with how a Lightning Component works and want to build producti...

assignment process & validations

  • Batch Apex Governor Limits & Best Practices Batch Apex Governor Limits Keep in mind the following governor limits for batch Apex. Up to 5 batch jobs can be queued or active con...
  • Sample Trigger Scenarios of Salesforce Trigger Scenario 1: Create “Top X Designation” custom object which is the related list to Opportunity (Look up Relationship). In the Top...

' border=

IMAGES

  1. Free ISO 13485 Process Validation Template

    assignment process & validations

  2. Process Validation: The Essential Guide to Ensuring Product Quality and

    assignment process & validations

  3. How to create a Validation Master Plan in 5 steps. Templates & more

    assignment process & validations

  4. Purpose of Process Validation

    assignment process & validations

  5. Pyramid With 7 Levels Of Process Validation

    assignment process & validations

  6. PROCESS VALIDATION

    assignment process & validations

VIDEO

  1. Java Institute || WP 1 || Task 2 || Rasith Munasinghe

  2. Task 2

  3. Web programing-Task 2

  4. assignment process percentage piece pack due dealer chapter

  5. E-Shop

  6. Program the user sign up process including input validations

COMMENTS

  1. PDF Process Validation: General Principles and Practices

    FDA regulations require that process validation procedures be established and followed (§ 211.100) before a batch can be distributed (§§ 211.22 and 211.165). routine production. It should also ...

  2. Biopharmaceutical Manufacturing Process Validation and Quality Risk

    25 Process validation today is a continual, risk-based, quality-focused exercise that encompasses the entire product life cycle.. Manufacturing processes for biopharmaceuticals must be designed to produce products that have consistent quality attributes. This entails removing impurities and contaminants that include endotoxins, viruses, cell membranes, nucleic acids, proteins, culture media ...

  3. Process Validation in the Pharmaceutical Industry

    Process validation is a step-by-step procedure designed to ensure that a manufacturing process can consistently produce quality products. It is performed by a validation team led by the quality assurance head of manufacturers in the pharmaceutical industry. Generally, process validation is done before releasing a new product, when applying any ...

  4. PDF Process Validation

    Stage 2: Process. Stage 3: Continued Process Verification. Stage 1: Design Qualification (DQ) Equipment design and selection based on your needs. Define user, functional, and operational requirements. Ensure the equipment is designed correctly and will have the appropriate functionality. Lack of DQ = deficient equipment that can have issues ...

  5. Model Validation and Reasonableness Checking/Assignment

    Assignment validation is an important step in validating not only the assignment process but the entire modeling process. Assignment validation typically benefits from a wealth of independent validation data including traffic counts and transit boardings collected independently of household or other survey data used for model estimation and ...

  6. PDF ICH Q7 Chapter 12 & 19.6: Process Validation

    Definitions on Validation. As defined in ICH Q7. -'Establishing documented evidence which provides a high degree of assurance that a specific process will consistently produce a product meeting its pre-determined specifications and quality attributes.'(12.40) As defined in ICH Q8(R2)/Q11. Continuous Process Verification.

  7. Process Validation : New Approach (SOP / Protocol)

    Process Validation: Establishing documented evidence through collection and evaluation of data from the process design stage to routine production, which establishes scientific evidence and provides a high degree of assurance that a process is capable of consistently yield products meeting pre-determined specifications and quality attributes. SOP and Protocol for Process Validation of Drug Product

  8. Processes, auto-response rules, validation rules, assignment rules, and

    Validation Rules. Set up Assignment Rules. Set Up Auto-Response Rules. Tips for Working with Picklist and Multi-Select Picklist Formula Fields. Time triggers and time-dependent considerations. Knowledge Article Number. 000387481. This is a compilation of article links for workflows and processes.

  9. The Four Types of Process Validation

    The process validation activities can be described in three stages. Stage 1 - Process Design: The commercial process is defined during this stage based on knowledge gained through development and scale-up activities.. Stage 2 - Process Qualification: During this stage, the process design is confirmed as being capable of reproducible commercial manufacturing.

  10. 6 The Concept of Validity and the Process of Validation

    Validity, according to the 1999 Standards for Educational and Psychological Testing, is "the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests" (p. 9).Although issues of validity are relevant to all fields of psychological measurement, our focus in this chapter is on the concept of validity and the process of validation as it ...

  11. A Step-by-Step Guide to Analytical Method Development and Validation

    Method Validation Overview: Method validation is the process of demonstrating that an analytical method is suitable for its intended use, and that it is capable of producing reliable and consistent results over time. The validation process involves a set of procedures and tests designed to evaluate the performance characteristics of the method.

  12. Analytical Method Validation for Quality Assurance and Process

    Method validation is a critical activity in the pharmaceutical industry. Validation data are used to confirm that the analytical procedure employed for a specific test is suitable for its intended purposes. These results demonstrate the performance, consistency, and reliability of the analytical method. This paper summarizes the requirements of method validation and data generation to document ...

  13. Validation of educational assessments: a primer for simulation and

    Validation is a process. Validation refers to the process of collecting validity evidence to evaluate the appropriateness of the interpretations, uses, and decisions based on assessment results . This definition highlights several important points. First, validation is a process not an endpoint.

  14. How to conduct assessment validation (Part 1)

    The following 6-step process can be used to check the assessment tool for compliance: Step 1. Read the assessment requirements. Step 2. Review the assessment plan. Step 3. Review the assessment matrix (mapping) Step 4. Check the details about how the knowledge evidence is planned to be being gathered.

  15. PDF SAM.gov Entity Validation

    What Is Entity Validation? (1 of 2) The validation process is a critical piece of the federal awards ecosystem. It prevents improper payments, procurement fraud, and helps ensure the integrity of government contracts and grants processes, representing trillions of dollars in taxpayer funds each year. SAM.gov uses an entity validation service ...

  16. Triggers and Order of Execution

    Triggers and Order of Execution. When you save a record with an insert, update, or upsert statement, Salesforce performs a sequence of events in a certain order. Before Salesforce executes these events on the server, the browser runs JavaScript validation if the record contains any dependent picklist fields.

  17. How to Validate Process Maps: A Guide for Process Designers

    Process map validation is an ongoing and iterative process that requires regular review and revision. To maintain and update your process maps, you need to assign a person or team responsible for ...

  18. Comparing Default Account Assignment, Validation, and Substitutio

    Default account assignment proposes default values when using specific accounts. It's utilized for automatically directing certain routine transactions to specific account assignment objects, such as posting office expenses to cost centers. Validation is used as an automated checking tool ensuring financial data meets predefined conditions. If ...

  19. Assignment Center

    Assignment Center is a web portal that allows users to access and manage patent and trademark assignments online. Users can search, record, and review assignments, as well as download forms and instructions. Assignment Center also provides links to FAQs and other resources related to patent and trademark assignments.

  20. Assign Validation Methods

    For the 01 and 02 task types, you can assign a validation method to each consolidation unit, while for the 03 task type, you can assign a validation method to each consolidation group. Use the following procedure to assign methods to consolidation units or groups: Select a consolidation version in the Version field. The method assignment is version-dependent, that is, you can assign different ...

  21. Assignment (law)

    Assignment (law) Assignment [1] is a legal term used in the context of the laws of contract and of property. In both instances, assignment is the process whereby a person, the assignor, transfers rights or benefits to another, the assignee. [2] An assignment may not transfer a duty, burden or detriment without the express agreement of the assignee.

  22. How to Skip Field level validations, When I Completed an assignment in

    I you want to proceed with the process flow, use customized buttons one which calls the finish assignment without the validations (Obj-validate) moves to the next; and the other which finishes the assignments with all validations. You can set some property on click of each button to get differentiated between the buttons.

  23. Salesforcean: Assignment Process & Validations

    Assignment Process & Validations Assignment Rule: In terms of assigning Leads and cases, you can set the assignment rules so based on the criteria you can assign them to either a user or queue. This will help to auto routing of leads & cases as their volume is large and don;t need manual intervention.