research analysis and evaluation

Search form

research analysis and evaluation

  • Table of Contents
  • Troubleshooting Guide
  • A Model for Getting Started
  • Justice Action Toolkit
  • Best Change Processes
  • Databases of Best Practices
  • Online Courses
  • Ask an Advisor
  • Subscribe to eNewsletter
  • Community Stories
  • YouTube Channel
  • About the Tool Box
  • How to Use the Tool Box
  • Privacy Statement
  • Workstation/Check Box Sign-In
  • Online Training Courses
  • Capacity Building Training
  • Training Curriculum - Order Now
  • Community Check Box Evaluation System
  • Build Your Toolbox
  • Facilitation of Community Processes
  • Community Health Assessment and Planning
  • Section 1. A Framework for Program Evaluation: A Gateway to Tools

Chapter 36 Sections

  • Section 2. Community-based Participatory Research
  • Section 3. Understanding Community Leadership, Evaluators, and Funders: What Are Their Interests?
  • Section 4. Choosing Evaluators
  • Section 5. Developing an Evaluation Plan
  • Section 6. Participatory Evaluation
  • Main Section
This section is adapted from the article "Recommended Framework for Program Evaluation in Public Health Practice," by Bobby Milstein, Scott Wetterhall, and the CDC Evaluation Working Group.

Around the world, there exist many programs and interventions developed to improve conditions in local communities. Communities come together to reduce the level of violence that exists, to work for safe, affordable housing for everyone, or to help more students do well in school, to give just a few examples.

But how do we know whether these programs are working? If they are not effective, and even if they are, how can we improve them to make them better for local communities? And finally, how can an organization make intelligent choices about which promising programs are likely to work best in their community?

Over the past years, there has been a growing trend towards the better use of evaluation to understand and improve practice.The systematic use of evaluation has solved many problems and helped countless community-based organizations do what they do better.

Despite an increased understanding of the need for - and the use of - evaluation, however, a basic agreed-upon framework for program evaluation has been lacking. In 1997, scientists at the United States Centers for Disease Control and Prevention (CDC) recognized the need to develop such a framework. As a result of this, the CDC assembled an Evaluation Working Group comprised of experts in the fields of public health and evaluation. Members were asked to develop a framework that summarizes and organizes the basic elements of program evaluation. This Community Tool Box section describes the framework resulting from the Working Group's efforts.

Before we begin, however, we'd like to offer some definitions of terms that we will use throughout this section.

By evaluation , we mean the systematic investigation of the merit, worth, or significance of an object or effort. Evaluation practice has changed dramatically during the past three decades - new methods and approaches have been developed and it is now used for increasingly diverse projects and audiences.

Throughout this section, the term program is used to describe the object or effort that is being evaluated. It may apply to any action with the goal of improving outcomes for whole communities, for more specific sectors (e.g., schools, work places), or for sub-groups (e.g., youth, people experiencing violence or HIV/AIDS). This definition is meant to be very broad.

Examples of different types of programs include:

  • Direct service interventions (e.g., a program that offers free breakfast to improve nutrition for grade school children)
  • Community mobilization efforts (e.g., organizing a boycott of California grapes to improve the economic well-being of farm workers)
  • Research initiatives (e.g., an effort to find out whether inequities in health outcomes based on race can be reduced)
  • Surveillance systems (e.g., whether early detection of school readiness improves educational outcomes)
  • Advocacy work (e.g., a campaign to influence the state legislature to pass legislation regarding tobacco control)
  • Social marketing campaigns (e.g., a campaign in the Third World encouraging mothers to breast-feed their babies to reduce infant mortality)
  • Infrastructure building projects (e.g., a program to build the capacity of state agencies to support community development initiatives)
  • Training programs (e.g., a job training program to reduce unemployment in urban neighborhoods)
  • Administrative systems (e.g., an incentive program to improve efficiency of health services)

Program evaluation - the type of evaluation discussed in this section - is an essential organizational practice for all types of community health and development work. It is a way to evaluate the specific projects and activities community groups may take part in, rather than to evaluate an entire organization or comprehensive community initiative.

Stakeholders refer to those who care about the program or effort. These may include those presumed to benefit (e.g., children and their parents or guardians), those with particular influence (e.g., elected or appointed officials), and those who might support the effort (i.e., potential allies) or oppose it (i.e., potential opponents). Key questions in thinking about stakeholders are: Who cares? What do they care about?

This section presents a framework that promotes a common understanding of program evaluation. The overall goal is to make it easier for everyone involved in community health and development work to evaluate their efforts.

Why evaluate community health and development programs?

The type of evaluation we talk about in this section can be closely tied to everyday program operations. Our emphasis is on practical, ongoing evaluation that involves program staff, community members, and other stakeholders, not just evaluation experts. This type of evaluation offers many advantages for community health and development professionals.

For example, it complements program management by:

  • Helping to clarify program plans
  • Improving communication among partners
  • Gathering the feedback needed to improve and be accountable for program effectiveness

It's important to remember, too, that evaluation is not a new activity for those of us working to improve our communities. In fact, we assess the merit of our work all the time when we ask questions, consult partners, make assessments based on feedback, and then use those judgments to improve our work. When the stakes are low, this type of informal evaluation might be enough. However, when the stakes are raised - when a good deal of time or money is involved, or when many people may be affected - then it may make sense for your organization to use evaluation procedures that are more formal, visible, and justifiable.

How do you evaluate a specific program?

Before your organization starts with a program evaluation, your group should be very clear about the answers to the following questions:.

  • What will be evaluated?
  • What criteria will be used to judge program performance?
  • What standards of performance on the criteria must be reached for the program to be considered successful?
  • What evidence will indicate performance on the criteria relative to the standards?
  • What conclusions about program performance are justified based on the available evidence?

To clarify the meaning of each, let's look at some of the answers for Drive Smart, a hypothetical program begun to stop drunk driving.

  • Drive Smart, a program focused on reducing drunk driving through public education and intervention.
  • The number of community residents who are familiar with the program and its goals
  • The number of people who use "Safe Rides" volunteer taxis to get home
  • The percentage of people who report drinking and driving
  • The reported number of single car night time crashes (This is a common way to try to determine if the number of people who drive drunk is changing)
  • 80% of community residents will know about the program and its goals after the first year of the program
  • The number of people who use the "Safe Rides" taxis will increase by 20% in the first year
  • The percentage of people who report drinking and driving will decrease by 20% in the first year
  • The reported number of single car night time crashes will decrease by 10 % in the program's first two years
  • A random telephone survey will demonstrate community residents' knowledge of the program and changes in reported behavior
  • Logs from "Safe Rides" will tell how many people use their services
  • Information on single car night time crashes will be gathered from police records
  • Are the changes we have seen in the level of drunk driving due to our efforts, or something else? Or (if no or insufficient change in behavior or outcome,)
  • Should Drive Smart change what it is doing, or have we just not waited long enough to see results?

The following framework provides an organized approach to answer these questions.

A framework for program evaluation

Program evaluation offers a way to understand and improve community health and development practice using methods that are useful, feasible, proper, and accurate. The framework described below is a practical non-prescriptive tool that summarizes in a logical order the important elements of program evaluation.

The framework contains two related dimensions:

  • Steps in evaluation practice, and
  • Standards for "good" evaluation.

The six connected steps of the framework are actions that should be a part of any evaluation. Although in practice the steps may be encountered out of order, it will usually make sense to follow them in the recommended sequence. That's because earlier steps provide the foundation for subsequent progress. Thus, decisions about how to carry out a given step should not be finalized until prior steps have been thoroughly addressed.

However, these steps are meant to be adaptable, not rigid. Sensitivity to each program's unique context (for example, the program's history and organizational climate) is essential for sound evaluation. They are intended to serve as starting points around which community organizations can tailor an evaluation to best meet their needs.

  • Engage stakeholders
  • Describe the program
  • Focus the evaluation design
  • Gather credible evidence
  • Justify conclusions
  • Ensure use and share lessons learned

Understanding and adhering to these basic steps will improve most evaluation efforts.

The second part of the framework is a basic set of standards to assess the quality of evaluation activities. There are 30 specific standards, organized into the following four groups:

  • Feasibility

These standards help answer the question, "Will this evaluation be a 'good' evaluation?" They are recommended as the initial criteria by which to judge the quality of the program evaluation efforts.

Engage Stakeholders

Stakeholders are people or organizations that have something to gain or lose from what will be learned from an evaluation, and also in what will be done with that knowledge. Evaluation cannot be done in isolation. Almost everything done in community health and development work involves partnerships - alliances among different organizations, board members, those affected by the problem, and others. Therefore, any serious effort to evaluate a program must consider the different values held by the partners. Stakeholders must be part of the evaluation to ensure that their unique perspectives are understood. When stakeholders are not appropriately involved, evaluation findings are likely to be ignored, criticized, or resisted.

However, if they are part of the process, people are likely to feel a good deal of ownership for the evaluation process and results. They will probably want to develop it, defend it, and make sure that the evaluation really works.

That's why this evaluation cycle begins by engaging stakeholders. Once involved, these people will help to carry out each of the steps that follows.

Three principle groups of stakeholders are important to involve:

  • People or organizations involved in program operations may include community members, sponsors, collaborators, coalition partners, funding officials, administrators, managers, and staff.
  • People or organizations served or affected by the program may include clients, family members, neighborhood organizations, academic institutions, elected and appointed officials, advocacy groups, and community residents. Individuals who are openly skeptical of or antagonistic toward the program may also be important to involve. Opening an evaluation to opposing perspectives and enlisting the help of potential program opponents can strengthen the evaluation's credibility.

Likewise, individuals or groups who could be adversely or inadvertently affected by changes arising from the evaluation have a right to be engaged. For example, it is important to include those who would be affected if program services were expanded, altered, limited, or ended as a result of the evaluation.

  • Primary intended users of the evaluation are the specific individuals who are in a position to decide and/or do something with the results.They shouldn't be confused with primary intended users of the program, although some of them should be involved in this group. In fact, primary intended users should be a subset of all of the stakeholders who have been identified. A successful evaluation will designate primary intended users, such as program staff and funders, early in its development and maintain frequent interaction with them to be sure that the evaluation specifically addresses their values and needs.

The amount and type of stakeholder involvement will be different for each program evaluation. For instance, stakeholders can be directly involved in designing and conducting the evaluation. They can be kept informed about progress of the evaluation through periodic meetings, reports, and other means of communication.

It may be helpful, when working with a group such as this, to develop an explicit process to share power and resolve conflicts . This may help avoid overemphasis of values held by any specific stakeholder.

Describe the Program

A program description is a summary of the intervention being evaluated. It should explain what the program is trying to accomplish and how it tries to bring about those changes. The description will also illustrate the program's core components and elements, its ability to make changes, its stage of development, and how the program fits into the larger organizational and community environment.

How a program is described sets the frame of reference for all future decisions about its evaluation. For example, if a program is described as, "attempting to strengthen enforcement of existing laws that discourage underage drinking," the evaluation might be very different than if it is described as, "a program to reduce drunk driving by teens." Also, the description allows members of the group to compare the program to other similar efforts, and it makes it easier to figure out what parts of the program brought about what effects.

Moreover, different stakeholders may have different ideas about what the program is supposed to achieve and why. For example, a program to reduce teen pregnancy may have some members who believe this means only increasing access to contraceptives, and other members who believe it means only focusing on abstinence.

Evaluations done without agreement on the program definition aren't likely to be very useful. In many cases, the process of working with stakeholders to develop a clear and logical program description will bring benefits long before data are available to measure program effectiveness.

There are several specific aspects that should be included when describing a program.

Statement of need

A statement of need describes the problem, goal, or opportunity that the program addresses; it also begins to imply what the program will do in response. Important features to note regarding a program's need are: the nature of the problem or goal, who is affected, how big it is, and whether (and how) it is changing.


Expectations are the program's intended results. They describe what the program has to accomplish to be considered successful. For most programs, the accomplishments exist on a continuum (first, we want to accomplish X... then, we want to do Y...). Therefore, they should be organized by time ranging from specific (and immediate) to broad (and longer-term) consequences. For example, a program's vision, mission, goals, and objectives , all represent varying levels of specificity about a program's expectations.

Activities are everything the program does to bring about changes. Describing program components and elements permits specific strategies and actions to be listed in logical sequence. This also shows how different program activities, such as education and enforcement, relate to one another. Describing program activities also provides an opportunity to distinguish activities that are the direct responsibility of the program from those that are conducted by related programs or partner organizations. Things outside of the program that may affect its success, such as harsher laws punishing businesses that sell alcohol to minors, can also be noted.

Resources include the time, talent, equipment, information, money, and other assets available to conduct program activities. Reviewing the resources a program has tells a lot about the amount and intensity of its services. It may also point out situations where there is a mismatch between what the group wants to do and the resources available to carry out these activities. Understanding program costs is a necessity to assess the cost-benefit ratio as part of the evaluation.

Stage of development

A program's stage of development reflects its maturity. All community health and development programs mature and change over time. People who conduct evaluations, as well as those who use their findings, need to consider the dynamic nature of programs. For example, a new program that just received its first grant may differ in many respects from one that has been running for over a decade.

At least three phases of development are commonly recognized: planning , implementation , and effects or outcomes . In the planning stage, program activities are untested and the goal of evaluation is to refine plans as much as possible. In the implementation phase, program activities are being field tested and modified; the goal of evaluation is to see what happens in the "real world" and to improve operations. In the effects stage, enough time has passed for the program's effects to emerge; the goal of evaluation is to identify and understand the program's results, including those that were unintentional.

A description of the program's context considers the important features of the environment in which the program operates. This includes understanding the area's history, geography, politics, and social and economic conditions, and also what other organizations have done. A realistic and responsive evaluation is sensitive to a broad range of potential influences on the program. An understanding of the context lets users interpret findings accurately and assess their generalizability. For example, a program to improve housing in an inner-city neighborhood might have been a tremendous success, but would likely not work in a small town on the other side of the country without significant adaptation.

Logic model

A logic model synthesizes the main program elements into a picture of how the program is supposed to work. It makes explicit the sequence of events that are presumed to bring about change. Often this logic is displayed in a flow-chart, map, or table to portray the sequence of steps leading to program results.

Creating a logic model allows stakeholders to improve and focus program direction. It reveals assumptions about conditions for program effectiveness and provides a frame of reference for one or more evaluations of the program. A detailed logic model can also be a basis for estimating the program's effect on endpoints that are not directly measured. For example, it may be possible to estimate the rate of reduction in disease from a known number of persons experiencing the intervention if there is prior knowledge about its effectiveness.

The breadth and depth of a program description will vary for each program evaluation. And so, many different activities may be part of developing that description. For instance, multiple sources of information could be pulled together to construct a well-rounded description. The accuracy of an existing program description could be confirmed through discussion with stakeholders. Descriptions of what's going on could be checked against direct observation of activities in the field. A narrow program description could be fleshed out by addressing contextual factors (such as staff turnover, inadequate resources, political pressures, or strong community participation) that may affect program performance.

Focus the Evaluation Design

By focusing the evaluation design, we mean doing advance planning about where the evaluation is headed, and what steps it will take to get there. It isn't possible or useful for an evaluation to try to answer all questions for all stakeholders; there must be a focus. A well-focused plan is a safeguard against using time and resources inefficiently.

Depending on what you want to learn, some types of evaluation will be better suited than others. However, once data collection begins, it may be difficult or impossible to change what you are doing, even if it becomes obvious that other methods would work better. A thorough plan anticipates intended uses and creates an evaluation strategy with the greatest chance to be useful, feasible, proper, and accurate.

Among the issues to consider when focusing an evaluation are:

Purpose refers to the general intent of the evaluation. A clear purpose serves as the basis for the design, methods, and use of the evaluation. Taking time to articulate an overall purpose will stop your organization from making uninformed decisions about how the evaluation should be conducted and used.

There are at least four general purposes for which a community group might conduct an evaluation:

  • To gain insight .This happens, for example, when deciding whether to use a new approach (e.g., would a neighborhood watch program work for our community?) Knowledge from such an evaluation will provide information about its practicality. For a developing program, information from evaluations of similar programs can provide the insight needed to clarify how its activities should be designed.
  • To improve how things get done .This is appropriate in the implementation stage when an established program tries to describe what it has done. This information can be used to describe program processes, to improve how the program operates, and to fine-tune the overall strategy. Evaluations done for this purpose include efforts to improve the quality, effectiveness, or efficiency of program activities.
  • To determine what the effects of the program are . Evaluations done for this purpose examine the relationship between program activities and observed consequences. For example, are more students finishing high school as a result of the program? Programs most appropriate for this type of evaluation are mature programs that are able to state clearly what happened and who it happened to. Such evaluations should provide evidence about what the program's contribution was to reaching longer-term goals such as a decrease in child abuse or crime in the area. This type of evaluation helps establish the accountability, and thus, the credibility, of a program to funders and to the community.
  • Empower program participants (for example, being part of an evaluation can increase community members' sense of control over the program);
  • Supplement the program (for example, using a follow-up questionnaire can reinforce the main messages of the program);
  • Promote staff development (for example, by teaching staff how to collect, analyze, and interpret evidence); or
  • Contribute to organizational growth (for example, the evaluation may clarify how the program relates to the organization's mission).

Users are the specific individuals who will receive evaluation findings. They will directly experience the consequences of inevitable trade-offs in the evaluation process. For example, a trade-off might be having a relatively modest evaluation to fit the budget with the outcome that the evaluation results will be less certain than they would be for a full-scale evaluation. Because they will be affected by these tradeoffs, intended users have a right to participate in choosing a focus for the evaluation. An evaluation designed without adequate user involvement in selecting the focus can become a misguided and irrelevant exercise. By contrast, when users are encouraged to clarify intended uses, priority questions, and preferred methods, the evaluation is more likely to focus on things that will inform (and influence) future actions.

Uses describe what will be done with what is learned from the evaluation. There is a wide range of potential uses for program evaluation. Generally speaking, the uses fall in the same four categories as the purposes listed above: to gain insight, improve how things get done, determine what the effects of the program are, and affect participants. The following list gives examples of uses in each category.

Some specific examples of evaluation uses

To gain insight:.

  • Assess needs and wants of community members
  • Identify barriers to use of the program
  • Learn how to best describe and measure program activities

To improve how things get done:

  • Refine plans for introducing a new practice
  • Determine the extent to which plans were implemented
  • Improve educational materials
  • Enhance cultural competence
  • Verify that participants' rights are protected
  • Set priorities for staff training
  • Make mid-course adjustments
  • Clarify communication
  • Determine if client satisfaction can be improved
  • Compare costs to benefits
  • Find out which participants benefit most from the program
  • Mobilize community support for the program

To determine what the effects of the program are:

  • Assess skills development by program participants
  • Compare changes in behavior over time
  • Decide where to allocate new resources
  • Document the level of success in accomplishing objectives
  • Demonstrate that accountability requirements are fulfilled
  • Use information from multiple evaluations to predict the likely effects of similar programs

To affect participants:

  • Reinforce messages of the program
  • Stimulate dialogue and raise awareness about community issues
  • Broaden consensus among partners about program goals
  • Teach evaluation skills to staff and other stakeholders
  • Gather success stories
  • Support organizational change and improvement

The evaluation needs to answer specific questions . Drafting questions encourages stakeholders to reveal what they believe the evaluation should answer. That is, what questions are more important to stakeholders? The process of developing evaluation questions further refines the focus of the evaluation.

The methods available for an evaluation are drawn from behavioral science and social research and development. Three types of methods are commonly recognized. They are experimental, quasi-experimental, and observational or case study designs. Experimental designs use random assignment to compare the effect of an intervention between otherwise equivalent groups (for example, comparing a randomly assigned group of students who took part in an after-school reading program with those who didn't). Quasi-experimental methods make comparisons between groups that aren't equal (e.g. program participants vs. those on a waiting list) or use of comparisons within a group over time, such as in an interrupted time series in which the intervention may be introduced sequentially across different individuals, groups, or contexts. Observational or case study methods use comparisons within a group to describe and explain what happens (e.g., comparative case studies with multiple communities).

No design is necessarily better than another. Evaluation methods should be selected because they provide the appropriate information to answer stakeholders' questions, not because they are familiar, easy, or popular. The choice of methods has implications for what will count as evidence, how that evidence will be gathered, and what kind of claims can be made. Because each method option has its own biases and limitations, evaluations that mix methods are generally more robust.

Over the course of an evaluation, methods may need to be revised or modified. Circumstances that make a particular approach useful can change. For example, the intended use of the evaluation could shift from discovering how to improve the program to helping decide about whether the program should continue or not. Thus, methods may need to be adapted or redesigned to keep the evaluation on track.

Agreements summarize the evaluation procedures and clarify everyone's roles and responsibilities. An agreement describes how the evaluation activities will be implemented. Elements of an agreement include statements about the intended purpose, users, uses, and methods, as well as a summary of the deliverables, those responsible, a timeline, and budget.

The formality of the agreement depends upon the relationships that exist between those involved. For example, it may take the form of a legal contract, a detailed protocol, or a simple memorandum of understanding. Regardless of its formality, creating an explicit agreement provides an opportunity to verify the mutual understanding needed for a successful evaluation. It also provides a basis for modifying procedures if that turns out to be necessary.

As you can see, focusing the evaluation design may involve many activities. For instance, both supporters and skeptics of the program could be consulted to ensure that the proposed evaluation questions are politically viable. A menu of potential evaluation uses appropriate for the program's stage of development could be circulated among stakeholders to determine which is most compelling. Interviews could be held with specific intended users to better understand their information needs and timeline for action. Resource requirements could be reduced when users are willing to employ more timely but less precise evaluation methods.

Gather Credible Evidence

Credible evidence is the raw material of a good evaluation. The information learned should be seen by stakeholders as believable, trustworthy, and relevant to answer their questions. This requires thinking broadly about what counts as "evidence." Such decisions are always situational; they depend on the question being posed and the motives for asking it. For some questions, a stakeholder's standard for credibility could demand having the results of a randomized experiment. For another question, a set of well-done, systematic observations such as interactions between an outreach worker and community residents, will have high credibility. The difference depends on what kind of information the stakeholders want and the situation in which it is gathered.

Context matters! In some situations, it may be necessary to consult evaluation specialists. This may be especially true if concern for data quality is especially high. In other circumstances, local people may offer the deepest insights. Regardless of their expertise, however, those involved in an evaluation should strive to collect information that will convey a credible, well-rounded picture of the program and its efforts.

Having credible evidence strengthens the evaluation results as well as the recommendations that follow from them. Although all types of data have limitations, it is possible to improve an evaluation's overall credibility. One way to do this is by using multiple procedures for gathering, analyzing, and interpreting data. Encouraging participation by stakeholders can also enhance perceived credibility. When stakeholders help define questions and gather data, they will be more likely to accept the evaluation's conclusions and to act on its recommendations.

The following features of evidence gathering typically affect how credible it is seen as being:

Indicators translate general concepts about the program and its expected effects into specific, measurable parts.

Examples of indicators include:

  • The program's capacity to deliver services
  • The participation rate
  • The level of client satisfaction
  • The amount of intervention exposure (how many people were exposed to the program, and for how long they were exposed)
  • Changes in participant behavior
  • Changes in community conditions or norms
  • Changes in the environment (e.g., new programs, policies, or practices)
  • Longer-term changes in population health status (e.g., estimated teen pregnancy rate in the county)

Indicators should address the criteria that will be used to judge the program. That is, they reflect the aspects of the program that are most meaningful to monitor. Several indicators are usually needed to track the implementation and effects of a complex program or intervention.

One way to develop multiple indicators is to create a "balanced scorecard," which contains indicators that are carefully selected to complement one another. According to this strategy, program processes and effects are viewed from multiple perspectives using small groups of related indicators. For instance, a balanced scorecard for a single program might include indicators of how the program is being delivered; what participants think of the program; what effects are observed; what goals were attained; and what changes are occurring in the environment around the program.

Another approach to using multiple indicators is based on a program logic model, such as we discussed earlier in the section. A logic model can be used as a template to define a full spectrum of indicators along the pathway that leads from program activities to expected effects. For each step in the model, qualitative and/or quantitative indicators could be developed.

Indicators can be broad-based and don't need to focus only on a program's long -term goals. They can also address intermediary factors that influence program effectiveness, including such intangible factors as service quality, community capacity, or inter -organizational relations. Indicators for these and similar concepts can be created by systematically identifying and then tracking markers of what is said or done when the concept is expressed.

In the course of an evaluation, indicators may need to be modified or new ones adopted. Also, measuring program performance by tracking indicators is only one part of evaluation, and shouldn't be confused as a basis for decision making in itself. There are definite perils to using performance indicators as a substitute for completing the evaluation process and reaching fully justified conclusions. For example, an indicator, such as a rising rate of unemployment, may be falsely assumed to reflect a failing program when it may actually be due to changing environmental conditions that are beyond the program's control.

Sources of evidence in an evaluation may be people, documents, or observations. More than one source may be used to gather evidence for each indicator. In fact, selecting multiple sources provides an opportunity to include different perspectives about the program and enhances the evaluation's credibility. For instance, an inside perspective may be reflected by internal documents and comments from staff or program managers; whereas clients and those who do not support the program may provide different, but equally relevant perspectives. Mixing these and other perspectives provides a more comprehensive view of the program or intervention.

The criteria used to select sources should be clearly stated so that users and other stakeholders can interpret the evidence accurately and assess if it may be biased. In addition, some sources provide information in narrative form (for example, a person's experience when taking part in the program) and others are numerical (for example, how many people were involved in the program). The integration of qualitative and quantitative information can yield evidence that is more complete and more useful, thus meeting the needs and expectations of a wider range of stakeholders.

Quality refers to the appropriateness and integrity of information gathered in an evaluation. High quality data are reliable and informative. It is easier to collect if the indicators have been well defined. Other factors that affect quality may include instrument design, data collection procedures, training of those involved in data collection, source selection, coding, data management, and routine error checking. Obtaining quality data will entail tradeoffs (e.g. breadth vs. depth); stakeholders should decide together what is most important to them. Because all data have limitations, the intent of a practical evaluation is to strive for a level of quality that meets the stakeholders' threshold for credibility.

Quantity refers to the amount of evidence gathered in an evaluation. It is necessary to estimate in advance the amount of information that will be required and to establish criteria to decide when to stop collecting data - to know when enough is enough. Quantity affects the level of confidence or precision users can have - how sure we are that what we've learned is true. It also partly determines whether the evaluation will be able to detect effects. All evidence collected should have a clear, anticipated use.

By logistics , we mean the methods, timing, and physical infrastructure for gathering and handling evidence. People and organizations also have cultural preferences that dictate acceptable ways of asking questions and collecting information, including who would be perceived as an appropriate person to ask the questions. For example, some participants may be unwilling to discuss their behavior with a stranger, whereas others are more at ease with someone they don't know. Therefore, the techniques for gathering evidence in an evaluation must be in keeping with the cultural norms of the community. Data collection procedures should also ensure that confidentiality is protected.

Justify Conclusions

The process of justifying conclusions recognizes that evidence in an evaluation does not necessarily speak for itself. Evidence must be carefully considered from a number of different stakeholders' perspectives to reach conclusions that are well -substantiated and justified. Conclusions become justified when they are linked to the evidence gathered and judged against agreed-upon values set by the stakeholders. Stakeholders must agree that conclusions are justified in order to use the evaluation results with confidence.

The principal elements involved in justifying conclusions based on evidence are:

Standards reflect the values held by stakeholders about the program. They provide the basis to make program judgments. The use of explicit standards for judgment is fundamental to sound evaluation. In practice, when stakeholders articulate and negotiate their values, these become the standards to judge whether a given program's performance will, for instance, be considered "successful," "adequate," or "unsuccessful."

Analysis and synthesis

Analysis and synthesis are methods to discover and summarize an evaluation's findings. They are designed to detect patterns in evidence, either by isolating important findings (analysis) or by combining different sources of information to reach a larger understanding (synthesis). Mixed method evaluations require the separate analysis of each evidence element, as well as a synthesis of all sources to examine patterns that emerge. Deciphering facts from a given body of evidence involves deciding how to organize, classify, compare, and display information. These decisions are guided by the questions being asked, the types of data available, and especially by input from stakeholders and primary intended users.


Interpretation is the effort to figure out what the findings mean. Uncovering facts about a program's performance isn't enough to make conclusions. The facts must be interpreted to understand their practical significance. For example, saying, "15 % of the people in our area witnessed a violent act last year," may be interpreted differently depending on the situation. For example, if 50% of community members had watched a violent act in the last year when they were surveyed five years ago, the group can suggest that, while still a problem, things are getting better in the community. However, if five years ago only 7% of those surveyed said the same thing, community organizations may see this as a sign that they might want to change what they are doing. In short, interpretations draw on information and perspectives that stakeholders bring to the evaluation. They can be strengthened through active participation or interaction with the data and preliminary explanations of what happened.

Judgments are statements about the merit, worth, or significance of the program. They are formed by comparing the findings and their interpretations against one or more selected standards. Because multiple standards can be applied to a given program, stakeholders may reach different or even conflicting judgments. For instance, a program that increases its outreach by 10% from the previous year may be judged positively by program managers, based on standards of improved performance over time. Community members, however, may feel that despite improvements, a minimum threshold of access to services has still not been reached. Their judgment, based on standards of social equity, would therefore be negative. Conflicting claims about a program's quality, value, or importance often indicate that stakeholders are using different standards or values in making judgments. This type of disagreement can be a catalyst to clarify values and to negotiate the appropriate basis (or bases) on which the program should be judged.


Recommendations are actions to consider as a result of the evaluation. Forming recommendations requires information beyond just what is necessary to form judgments. For example, knowing that a program is able to increase the services available to battered women doesn't necessarily translate into a recommendation to continue the effort, particularly when there are competing priorities or other effective alternatives. Thus, recommendations about what to do with a given intervention go beyond judgments about a specific program's effectiveness.

If recommendations aren't supported by enough evidence, or if they aren't in keeping with stakeholders' values, they can really undermine an evaluation's credibility. By contrast, an evaluation can be strengthened by recommendations that anticipate and react to what users will want to know.

Three things might increase the chances that recommendations will be relevant and well-received:

  • Sharing draft recommendations
  • Soliciting reactions from multiple stakeholders
  • Presenting options instead of directive advice

Justifying conclusions in an evaluation is a process that involves different possible steps. For instance, conclusions could be strengthened by searching for alternative explanations from the ones you have chosen, and then showing why they are unsupported by the evidence. When there are different but equally well supported conclusions, each could be presented with a summary of their strengths and weaknesses. Techniques to analyze, synthesize, and interpret findings might be agreed upon before data collection begins.

Ensure Use and Share Lessons Learned

It is naive to assume that lessons learned in an evaluation will necessarily be used in decision making and subsequent action. Deliberate effort on the part of evaluators is needed to ensure that the evaluation findings will be used appropriately. Preparing for their use involves strategic thinking and continued vigilance in looking for opportunities to communicate and influence. Both of these should begin in the earliest stages of the process and continue throughout the evaluation.

The elements of key importance to be sure that the recommendations from an evaluation are used are:

Design refers to how the evaluation's questions, methods, and overall processes are constructed. As discussed in the third step of this framework (focusing the evaluation design), the evaluation should be organized from the start to achieve specific agreed-upon uses. Having a clear purpose that is focused on the use of what is learned helps those who will carry out the evaluation to know who will do what with the findings. Furthermore, the process of creating a clear design will highlight ways that stakeholders, through their many contributions, can improve the evaluation and facilitate the use of the results.


Preparation refers to the steps taken to get ready for the future uses of the evaluation findings. The ability to translate new knowledge into appropriate action is a skill that can be strengthened through practice. In fact, building this skill can itself be a useful benefit of the evaluation. It is possible to prepare stakeholders for future use of the results by discussing how potential findings might affect decision making.

For example, primary intended users and other stakeholders could be given a set of hypothetical results and asked what decisions or actions they would make on the basis of this new knowledge. If they indicate that the evidence presented is incomplete or irrelevant and that no action would be taken, then this is an early warning sign that the planned evaluation should be modified. Preparing for use also gives stakeholders more time to explore both positive and negative implications of potential results and to identify different options for program improvement.

Feedback is the communication that occurs among everyone involved in the evaluation. Giving and receiving feedback creates an atmosphere of trust among stakeholders; it keeps an evaluation on track by keeping everyone informed about how the evaluation is proceeding. Primary intended users and other stakeholders have a right to comment on evaluation decisions. From a standpoint of ensuring use, stakeholder feedback is a necessary part of every step in the evaluation. Obtaining valuable feedback can be encouraged by holding discussions during each step of the evaluation and routinely sharing interim findings, provisional interpretations, and draft reports.

Follow-up refers to the support that many users need during the evaluation and after they receive evaluation findings. Because of the amount of effort required, reaching justified conclusions in an evaluation can seem like an end in itself. It is not . Active follow-up may be necessary to remind users of the intended uses of what has been learned. Follow-up may also be required to stop lessons learned from becoming lost or ignored in the process of making complex or political decisions. To guard against such oversight, it may be helpful to have someone involved in the evaluation serve as an advocate for the evaluation's findings during the decision -making phase.

Facilitating the use of evaluation findings also carries with it the responsibility to prevent misuse. Evaluation results are always bounded by the context in which the evaluation was conducted. Some stakeholders, however, may be tempted to take results out of context or to use them for different purposes than what they were developed for. For instance, over-generalizing the results from a single case study to make decisions that affect all sites in a national program is an example of misuse of a case study evaluation.

Similarly, program opponents may misuse results by overemphasizing negative findings without giving proper credit for what has worked. Active follow-up can help to prevent these and other forms of misuse by ensuring that evidence is only applied to the questions that were the central focus of the evaluation.


Dissemination is the process of communicating the procedures or the lessons learned from an evaluation to relevant audiences in a timely, unbiased, and consistent fashion. Like other elements of the evaluation, the reporting strategy should be discussed in advance with intended users and other stakeholders. Planning effective communications also requires considering the timing, style, tone, message source, vehicle, and format of information products. Regardless of how communications are constructed, the goal for dissemination is to achieve full disclosure and impartial reporting.

Along with the uses for evaluation findings, there are also uses that flow from the very process of evaluating. These "process uses" should be encouraged. The people who take part in an evaluation can experience profound changes in beliefs and behavior. For instance, an evaluation challenges staff members to act differently in what they are doing, and to question assumptions that connect program activities with intended effects.

Evaluation also prompts staff to clarify their understanding of the goals of the program. This greater clarity, in turn, helps staff members to better function as a team focused on a common end. In short, immersion in the logic, reasoning, and values of evaluation can have very positive effects, such as basing decisions on systematic judgments instead of on unfounded assumptions.

Additional process uses for evaluation include:

  • By defining indicators, what really matters to stakeholders becomes clear
  • It helps make outcomes matter by changing the reinforcements connected with achieving positive results. For example, a funder might offer "bonus grants" or "outcome dividends" to a program that has shown a significant amount of community change and improvement.

Standards for "good" evaluation

There are standards to assess whether all of the parts of an evaluation are well -designed and working to their greatest potential. The Joint Committee on Educational Evaluation developed "The Program Evaluation Standards" for this purpose. These standards, designed to assess evaluations of educational programs, are also relevant for programs and interventions related to community health and development.

The program evaluation standards make it practical to conduct sound and fair evaluations. They offer well-supported principles to follow when faced with having to make tradeoffs or compromises. Attending to the standards can guard against an imbalanced evaluation, such as one that is accurate and feasible, but isn't very useful or sensitive to the context. Another example of an imbalanced evaluation is one that would be genuinely useful, but is impossible to carry out.

The following standards can be applied while developing an evaluation design and throughout the course of its implementation. Remember, the standards are written as guiding principles, not as rigid rules to be followed in all situations.

The 30 more specific standards are grouped into four categories:

The utility standards are:

  • Stakeholder Identification : People who are involved in (or will be affected by) the evaluation should be identified, so that their needs can be addressed.
  • Evaluator Credibility : The people conducting the evaluation should be both trustworthy and competent, so that the evaluation will be generally accepted as credible or believable.
  • Information Scope and Selection : Information collected should address pertinent questions about the program, and it should be responsive to the needs and interests of clients and other specified stakeholders.
  • Values Identification: The perspectives, procedures, and rationale used to interpret the findings should be carefully described, so that the bases for judgments about merit and value are clear.
  • Report Clarity: Evaluation reports should clearly describe the program being evaluated, including its context, and the purposes, procedures, and findings of the evaluation. This will help ensure that essential information is provided and easily understood.
  • Report Timeliness and Dissemination: Significant midcourse findings and evaluation reports should be shared with intended users so that they can be used in a timely fashion.
  • Evaluation Impact: Evaluations should be planned, conducted, and reported in ways that encourage follow-through by stakeholders, so that the evaluation will be used.

Feasibility Standards

The feasibility standards are to ensure that the evaluation makes sense - that the steps that are planned are both viable and pragmatic.

The feasibility standards are:

  • Practical Procedures: The evaluation procedures should be practical, to keep disruption of everyday activities to a minimum while needed information is obtained.
  • Political Viability : The evaluation should be planned and conducted with anticipation of the different positions or interests of various groups. This should help in obtaining their cooperation so that possible attempts by these groups to curtail evaluation operations or to misuse the results can be avoided or counteracted.
  • Cost Effectiveness: The evaluation should be efficient and produce enough valuable information that the resources used can be justified.

Propriety Standards

The propriety standards ensure that the evaluation is an ethical one, conducted with regard for the rights and interests of those involved. The eight propriety standards follow.

  • Service Orientation : Evaluations should be designed to help organizations effectively serve the needs of all of the targeted participants.
  • Formal Agreements : The responsibilities in an evaluation (what is to be done, how, by whom, when) should be agreed to in writing, so that those involved are obligated to follow all conditions of the agreement, or to formally renegotiate it.
  • Rights of Human Subjects : Evaluation should be designed and conducted to respect and protect the rights and welfare of human subjects, that is, all participants in the study.
  • Human Interactions : Evaluators should respect basic human dignity and worth when working with other people in an evaluation, so that participants don't feel threatened or harmed.
  • Complete and Fair Assessment : The evaluation should be complete and fair in its examination, recording both strengths and weaknesses of the program being evaluated. This allows strengths to be built upon and problem areas addressed.
  • Disclosure of Findings : The people working on the evaluation should ensure that all of the evaluation findings, along with the limitations of the evaluation, are accessible to everyone affected by the evaluation, and any others with expressed legal rights to receive the results.
  • Conflict of Interest: Conflict of interest should be dealt with openly and honestly, so that it does not compromise the evaluation processes and results.
  • Fiscal Responsibility : The evaluator's use of resources should reflect sound accountability procedures and otherwise be prudent and ethically responsible, so that expenditures are accounted for and appropriate.

Accuracy Standards

The accuracy standards ensure that the evaluation findings are considered correct.

There are 12 accuracy standards:

  • Program Documentation: The program should be described and documented clearly and accurately, so that what is being evaluated is clearly identified.
  • Context Analysis: The context in which the program exists should be thoroughly examined so that likely influences on the program can be identified.
  • Described Purposes and Procedures: The purposes and procedures of the evaluation should be monitored and described in enough detail that they can be identified and assessed.
  • Defensible Information Sources: The sources of information used in a program evaluation should be described in enough detail that the adequacy of the information can be assessed.
  • Valid Information: The information gathering procedures should be chosen or developed and then implemented in such a way that they will assure that the interpretation arrived at is valid.
  • Reliable Information : The information gathering procedures should be chosen or developed and then implemented so that they will assure that the information obtained is sufficiently reliable.
  • Systematic Information: The information from an evaluation should be systematically reviewed and any errors found should be corrected.
  • Analysis of Quantitative Information: Quantitative information - data from observations or surveys - in an evaluation should be appropriately and systematically analyzed so that evaluation questions are effectively answered.
  • Analysis of Qualitative Information: Qualitative information - descriptive information from interviews and other sources - in an evaluation should be appropriately and systematically analyzed so that evaluation questions are effectively answered.
  • Justified Conclusions: The conclusions reached in an evaluation should be explicitly justified, so that stakeholders can understand their worth.
  • Impartial Reporting: Reporting procedures should guard against the distortion caused by personal feelings and biases of people involved in the evaluation, so that evaluation reports fairly reflect the evaluation findings.
  • Metaevaluation: The evaluation itself should be evaluated against these and other pertinent standards, so that it is appropriately guided and, on completion, stakeholders can closely examine its strengths and weaknesses.

Applying the framework: Conducting optimal evaluations

There is an ever-increasing agreement on the worth of evaluation; in fact, doing so is often required by funders and other constituents. So, community health and development professionals can no longer question whether or not to evaluate their programs. Instead, the appropriate questions are:

  • What is the best way to evaluate?
  • What are we learning from the evaluation?
  • How will we use what we learn to become more effective?

The framework for program evaluation helps answer these questions by guiding users to select evaluation strategies that are useful, feasible, proper, and accurate.

To use this framework requires quite a bit of skill in program evaluation. In most cases there are multiple stakeholders to consider, the political context may be divisive, steps don't always follow a logical order, and limited resources may make it difficult to take a preferred course of action. An evaluator's challenge is to devise an optimal strategy, given the conditions she is working under. An optimal strategy is one that accomplishes each step in the framework in a way that takes into account the program context and is able to meet or exceed the relevant standards.

This framework also makes it possible to respond to common concerns about program evaluation. For instance, many evaluations are not undertaken because they are seen as being too expensive. The cost of an evaluation, however, is relative; it depends upon the question being asked and the level of certainty desired for the answer. A simple, low-cost evaluation can deliver information valuable for understanding and improvement.

Rather than discounting evaluations as a time-consuming sideline, the framework encourages evaluations that are timed strategically to provide necessary feedback. This makes it possible to make evaluation closely linked with everyday practices.

Another concern centers on the perceived technical demands of designing and conducting an evaluation. However, the practical approach endorsed by this framework focuses on questions that can improve the program.

Finally, the prospect of evaluation troubles many staff members because they perceive evaluation methods as punishing ("They just want to show what we're doing wrong."), exclusionary ("Why aren't we part of it? We're the ones who know what's going on."), and adversarial ("It's us against them.") The framework instead encourages an evaluation approach that is designed to be helpful and engages all interested stakeholders in a process that welcomes their participation.

Evaluation is a powerful strategy for distinguishing programs and interventions that make a difference from those that don't. It is a driving force for developing and adapting sound strategies, improving existing programs, and demonstrating the results of investments in time and other resources. It also helps determine if what is being done is worth the cost.

This recommended framework for program evaluation is both a synthesis of existing best practices and a set of standards for further improvement. It supports a practical approach to evaluation based on steps and standards that can be applied in almost any setting. Because the framework is purposefully general, it provides a stable guide to design and conduct a wide range of evaluation efforts in a variety of specific program areas. The framework can be used as a template to create useful evaluation plans to contribute to understanding and improvement. The Magenta Book - Guidance for Evaluation  provides additional information on requirements for good evaluation, and some straightforward steps to make a good evaluation of an intervention more feasible, read The Magenta Book - Guidance for Evaluation.

Online Resources

Are You Ready to Evaluate your Coalition? prompts 15 questions to help the group decide whether your coalition is ready to evaluate itself and its work.

The  American Evaluation Association Guiding Principles for Evaluators  helps guide evaluators in their professional practice.

CDC Evaluation Resources  provides a list of resources for evaluation, as well as links to professional associations and journals.

Chapter 11: Community Interventions in the "Introduction to Community Psychology" explains professionally-led versus grassroots interventions, what it means for a community intervention to be effective, why a community needs to be ready for an intervention, and the steps to implementing community interventions.

The  Comprehensive Cancer Control Branch Program Evaluation Toolkit  is designed to help grantees plan and implement evaluations of their NCCCP-funded programs, this toolkit provides general guidance on evaluation principles and techniques, as well as practical templates and tools.

Developing an Effective Evaluation Plan  is a workbook provided by the CDC. In addition to information on designing an evaluation plan, this book also provides worksheets as a step-by-step guide.

EvaluACTION , from the CDC, is designed for people interested in learning about program evaluation and how to apply it to their work. Evaluation is a process, one dependent on what you’re currently doing and on the direction in which you’d like go. In addition to providing helpful information, the site also features an interactive Evaluation Plan & Logic Model Builder, so you can create customized tools for your organization to use.

Evaluating Your Community-Based Program  is a handbook designed by the American Academy of Pediatrics covering a variety of topics related to evaluation.

GAO Designing Evaluations  is a handbook provided by the U.S. Government Accountability Office with copious information regarding program evaluations.

The CDC's  Introduction to Program Evaluation for Publilc Health Programs: A Self-Study Guide  is a "how-to" guide for planning and implementing evaluation activities. The manual, based on CDC’s Framework for Program Evaluation in Public Health, is intended to assist with planning, designing, implementing and using comprehensive evaluations in a practical way.

McCormick Foundation Evaluation Guide  is a guide to planning an organization’s evaluation, with several chapters dedicated to gathering information and using it to improve the organization.

A Participatory Model for Evaluating Social Programs from the James Irvine Foundation.

Practical Evaluation for Public Managers  is a guide to evaluation written by the U.S. Department of Health and Human Services.

Penn State Program Evaluation  offers information on collecting different forms of data and how to measure different community markers.

Program Evaluaton  information page from Implementation Matters.

The Program Manager's Guide to Evaluation  is a handbook provided by the Administration for Children and Families with detailed answers to nine big questions regarding program evaluation.

Program Planning and Evaluation  is a website created by the University of Arizona. It provides links to information on several topics including methods, funding, types of evaluation, and reporting impacts.

User-Friendly Handbook for Program Evaluation  is a guide to evaluations provided by the National Science Foundation.  This guide includes practical information on quantitative and qualitative methodologies in evaluations.

W.K. Kellogg Foundation Evaluation Handbook  provides a framework for thinking about evaluation as a relevant and useful program tool. It was originally written for program directors with direct responsibility for the ongoing evaluation of the W.K. Kellogg Foundation.

Print Resources

This Community Tool Box section is an edited version of:

CDC Evaluation Working Group. (1999). (Draft). Recommended framework for program evaluation in public health practice . Atlanta, GA: Author.

The article cites the following references:

Adler. M., &  Ziglio, E. (1996). Gazing into the oracle: the delphi method and its application to social policy and community health and development. London: Jessica Kingsley Publishers.

Barrett, F.   Program Evaluation: A Step-by-Step Guide.  Sunnycrest Press, 2013. This practical manual includes helpful tips to develop evaluations, tables illustrating evaluation approaches, evaluation planning and reporting templates, and resources if you want more information.

Basch, C., Silepcevich, E., Gold, R., Duncan, D., & Kolbe, L. (1985).   Avoiding type III errors in health education program evaluation: a case study . Health Education Quarterly. 12(4):315-31.

Bickman L, & Rog, D. (1998). Handbook of applied social research methods. Thousand Oaks, CA: Sage Publications.

Boruch, R.  (1998).  Randomized controlled experiments for evaluation and planning. In Handbook of applied social research methods, edited by Bickman L., & Rog. D. Thousand Oaks, CA: Sage Publications: 161-92.

Centers for Disease Control and Prevention DoHAP. Evaluating CDC HIV prevention programs: guidance and data system . Atlanta, GA: Centers for Disease Control and Prevention, Division of HIV/AIDS Prevention, 1999.

Centers for Disease Control and Prevention. Guidelines for evaluating surveillance systems. Morbidity and Mortality Weekly Report 1988;37(S-5):1-18.

Centers for Disease Control and Prevention. Handbook for evaluating HIV education . Atlanta, GA: Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Division of Adolescent and School Health, 1995.

Cook, T., & Campbell, D. (1979). Quasi-experimentation . Chicago, IL: Rand McNally.

Cook, T.,& Reichardt, C. (1979).  Qualitative and quantitative methods in evaluation research . Beverly Hills, CA: Sage Publications.

Cousins, J.,& Whitmore, E. (1998).   Framing participatory evaluation. In Understanding and practicing participatory evaluation , vol. 80, edited by E Whitmore. San Francisco, CA: Jossey-Bass: 5-24.

Chen, H. (1990).  Theory driven evaluations . Newbury Park, CA: Sage Publications.

de Vries, H., Weijts, W., Dijkstra, M., & Kok, G. (1992).  The utilization of qualitative and quantitative data for health education program planning, implementation, and evaluation: a spiral approach . Health Education Quarterly.1992; 19(1):101-15.

Dyal, W. (1995).  Ten organizational practices of community health and development: a historical perspective . American Journal of Preventive Medicine;11(6):6-8.

Eddy, D. (1998). Performance measurement: problems and solutions . Health Affairs;17 (4):7-25.Harvard Family Research Project. Performance measurement. In The Evaluation Exchange, vol. 4, 1998, pp. 1-15.

Eoyang,G., & Berkas, T. (1996).  Evaluation in a complex adaptive system . Edited by (we don´t have the names), (1999): Taylor-Powell E, Steele S, Douglah M. Planning a program evaluation. Madison, Wisconsin: University of Wisconsin Cooperative Extension.

Fawcett, S.B., Paine-Andrews, A., Fancisco, V.T., Schultz, J.A., Richter, K.P, Berkley-Patton, J., Fisher, J., Lewis, R.K., Lopez, C.M., Russos, S., Williams, E.L., Harris, K.J., & Evensen, P. (2001). Evaluating community initiatives for health and development. In I. Rootman, D. McQueen, et al. (Eds.),  Evaluating health promotion approaches . (pp. 241-277). Copenhagen, Denmark: World Health Organization - Europe.

Fawcett , S., Sterling, T., Paine-, A., Harris, K., Francisco, V. et al. (1996).  Evaluating community efforts to prevent cardiovascular diseases . Atlanta, GA: Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion.

Fetterman, D.,, Kaftarian, S., & Wandersman, A. (1996).  Empowerment evaluation: knowledge and tools for self-assessment and accountability . Thousand Oaks, CA: Sage Publications.

Frechtling, J.,& Sharp, L. (1997).  User-friendly handbook for mixed method evaluations . Washington, DC: National Science Foundation.

Goodman, R., Speers, M., McLeroy, K., Fawcett, S., Kegler M., et al. (1998).  Identifying and defining the dimensions of community capacity to provide a basis for measurement . Health Education and Behavior;25(3):258-78.

Greene, J.  (1994). Qualitative program evaluation: practice and promise . In Handbook of Qualitative Research, edited by NK Denzin and YS Lincoln. Thousand Oaks, CA: Sage Publications.

Haddix, A., Teutsch. S., Shaffer. P., & Dunet. D. (1996). Prevention effectiveness: a guide to decision analysis and economic evaluation . New York, NY: Oxford University Press.

Hennessy, M.  Evaluation. In Statistics in Community health and development , edited by Stroup. D.,& Teutsch. S. New York, NY: Oxford University Press, 1998: 193-219

Henry, G. (1998). Graphing data. In Handbook of applied social research methods , edited by Bickman. L., & Rog.  D.. Thousand Oaks, CA: Sage Publications: 527-56.

Henry, G. (1998).  Practical sampling. In Handbook of applied social research methods , edited by  Bickman. L., & Rog. D.. Thousand Oaks, CA: Sage Publications: 101-26.

Institute of Medicine. Improving health in the community: a role for performance monitoring . Washington, DC: National Academy Press, 1997.

Joint Committee on Educational Evaluation, James R. Sanders (Chair). The program evaluation standards: how to assess evaluations of educational programs . Thousand Oaks, CA: Sage Publications, 1994.

Kaplan,  R., & Norton, D.  The balanced scorecard: measures that drive performance . Harvard Business Review 1992;Jan-Feb71-9.

Kar, S. (1989). Health promotion indicators and actions . New York, NY: Springer Publications.

Knauft, E. (1993).   What independent sector learned from an evaluation of its own hard-to -measure programs . In A vision of evaluation, edited by ST Gray. Washington, DC: Independent Sector.

Koplan, J. (1999)  CDC sets millennium priorities . US Medicine 4-7.

Lipsy, M. (1998).  Design sensitivity: statistical power for applied experimental research . In Handbook of applied social research methods, edited by Bickman, L., & Rog, D. Thousand Oaks, CA: Sage Publications. 39-68.

Lipsey, M. (1993). Theory as method: small theories of treatments . New Directions for Program Evaluation;(57):5-38.

Lipsey, M. (1997).  What can you build with thousands of bricks? Musings on the cumulation of knowledge in program evaluation . New Directions for Evaluation; (76): 7-23.

Love, A.  (1991).  Internal evaluation: building organizations from within . Newbury Park, CA: Sage Publications.

Miles, M., & Huberman, A. (1994).  Qualitative data analysis: a sourcebook of methods . Thousand Oaks, CA: Sage Publications, Inc.

National Quality Program. (1999).  National Quality Program , vol. 1999. National Institute of Standards and Technology.

National Quality Program . Baldridge index outperforms S&P 500 for fifth year, vol. 1999.

National Quality Program , 1999.

National Quality Program. Health care criteria for performance excellence , vol. 1999. National Quality Program, 1998.

Newcomer, K.  Using statistics appropriately. In Handbook of Practical Program Evaluation, edited by Wholey,J.,  Hatry, H., & Newcomer. K. San Francisco, CA: Jossey-Bass, 1994: 389-416.

Patton, M. (1990).  Qualitative evaluation and research methods . Newbury Park, CA: Sage Publications.

Patton, M (1997).  Toward distinguishing empowerment evaluation and placing it in a larger context . Evaluation Practice;18(2):147-63.

Patton, M. (1997).  Utilization-focused evaluation . Thousand Oaks, CA: Sage Publications.

Perrin, B. Effective use and misuse of performance measurement . American Journal of Evaluation 1998;19(3):367-79.

Perrin, E, Koshel J. (1997).  Assessment of performance measures for community health and development, substance abuse, and mental health . Washington, DC: National Academy Press.

Phillips, J. (1997).  Handbook of training evaluation and measurement methods . Houston, TX: Gulf Publishing Company.

Poreteous, N., Sheldrick B., & Stewart P. (1997).  Program evaluation tool kit: a blueprint for community health and development management . Ottawa, Canada: Community health and development Research, Education, and Development Program, Ottawa-Carleton Health Department.

Posavac, E., & Carey R. (1980).  Program evaluation: methods and case studies . Prentice-Hall, Englewood Cliffs, NJ.

Preskill, H. & Torres R. (1998).  Evaluative inquiry for learning in organizations . Thousand Oaks, CA: Sage Publications.

Public Health Functions Project. (1996). The public health workforce: an agenda for the 21st century . Washington, DC: U.S. Department of Health and Human Services, Community health and development Service.

Public Health Training Network. (1998).  Practical evaluation of public health programs . CDC, Atlanta, GA.

Reichardt, C., & Mark M. (1998).  Quasi-experimentation . In Handbook of applied social research methods, edited by L Bickman and DJ Rog. Thousand Oaks, CA: Sage Publications, 193-228.

Rossi, P., & Freeman H.  (1993).  Evaluation: a systematic approach . Newbury Park, CA: Sage Publications.

Rush, B., & Ogbourne A. (1995).  Program logic models: expanding their role and structure for program planning and evaluation . Canadian Journal of Program Evaluation;695 -106.

Sanders, J. (1993).  Uses of evaluation as a means toward organizational effectiveness. In A vision of evaluation , edited by ST Gray. Washington, DC: Independent Sector.

Schorr, L. (1997).   Common purpose: strengthening families and neighborhoods to rebuild America . New York, NY: Anchor Books, Doubleday.

Scriven, M. (1998) . A minimalist theory of evaluation: the least theory that practice requires . American Journal of Evaluation.

Shadish, W., Cook, T., Leviton, L. (1991).  Foundations of program evaluation . Newbury Park, CA: Sage Publications.

Shadish, W. (1998).   Evaluation theory is who we are. American Journal of Evaluation:19(1):1-19.

Shulha, L., & Cousins, J. (1997).  Evaluation use: theory, research, and practice since 1986 . Evaluation Practice.18(3):195-208

Sieber, J. (1998).   Planning ethically responsible research . In Handbook of applied social research methods, edited by L Bickman and DJ Rog. Thousand Oaks, CA: Sage Publications: 127-56.

Steckler, A., McLeroy, K., Goodman, R., Bird, S., McCormick, L. (1992).  Toward integrating qualitative and quantitative methods: an introduction . Health Education Quarterly;191-8.

Taylor-Powell, E., Rossing, B., Geran, J. (1998). Evaluating collaboratives: reaching the potential. Madison, Wisconsin: University of Wisconsin Cooperative Extension.

Teutsch, S.  A framework for assessing the effectiveness of disease and injury prevention . Morbidity and Mortality Weekly Report: Recommendations and Reports Series 1992;41 (RR-3 (March 27, 1992):1-13.

Torres, R., Preskill, H., Piontek, M., (1996).   Evaluation strategies for communicating and reporting: enhancing learning in organizations . Thousand Oaks, CA: Sage Publications.

Trochim, W. (1999).  Research methods knowledge base , vol.

United Way of America. Measuring program outcomes: a practical approach . Alexandria, VA: United Way of America, 1996.

U.S. General Accounting Office. Case study evaluations . GAO/PEMD-91-10.1.9. Washington, DC: U.S. General Accounting Office, 1990.

U.S. General Accounting Office. Designing evaluations . GAO/PEMD-10.1.4. Washington, DC: U.S. General Accounting Office, 1991.

U.S. General Accounting Office. Managing for results: measuring program results that are under limited federal control . GAO/GGD-99-16. Washington, DC: 1998.

U.S. General Accounting Office. Prospective evaluation methods: the prosepctive evaluation synthesis . GAO/PEMD-10.1.10. Washington, DC: U.S. General Accounting Office, 1990.

U.S. General Accounting Office. The evaluation synthesis . Washington, DC: U.S. General Accounting Office, 1992.

U.S. General Accounting Office. Using statistical sampling . Washington, DC: U.S. General Accounting Office, 1992.

Wandersman, A., Morrissey, E., Davino, K., Seybolt, D., Crusto, C., et al. Comprehensive quality programming and accountability: eight essential strategies for implementing successful prevention programs . Journal of Primary Prevention 1998;19(1):3-30.

Weiss, C. (1995). Nothing as practical as a good theory: exploring theory-based evaluation for comprehensive community initiatives for families and children . In New Approaches to Evaluating Community Initiatives, edited by Connell, J. Kubisch, A. Schorr, L.  & Weiss, C.  New York, NY, NY: Aspin Institute.

Weiss, C. (1998).  Have we learned anything new about the use of evaluation? American Journal of Evaluation;19(1):21-33.

Weiss, C. (1997).  How can theory-based evaluation make greater headway? Evaluation Review 1997;21(4):501-24.

W.K. Kellogg Foundation. (1998). The W.K. Foundation Evaluation Handbook . Battle Creek, MI: W.K. Kellogg Foundation.

Wong-Reiger, D.,& David, L. (1995).  Using program logic models to plan and evaluate education and prevention programs. In Evaluation Methods Sourcebook II, edited by Love. A.J. Ottawa, Ontario: Canadian Evaluation Society.

Wholey, S., Hatry, P., & Newcomer, E. .  Handbook of Practical Program Evaluation.  Jossey-Bass, 2010. This book serves as a comprehensive guide to the evaluation process and its practical applications for sponsors, program managers, and evaluators.

Yarbrough,  B., Lyn, M., Shulha, H., Rodney K., & Caruthers, A. (2011).  The Program Evaluation Standards: A Guide for Evalualtors and Evaluation Users Third Edition . Sage Publications.

Yin, R. (1988).  Case study research: design and methods . Newbury Park, CA: Sage Publications.

  • Search Menu
  • Advance articles
  • Author Guidelines
  • Submission Site
  • Open Access
  • Why Publish?
  • About Research Evaluation
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Julia Melkers

Emanuela Reale

Thed van Leeuwen

About the journal

Research Evaluation is an interdisciplinary peer-reviewed, international journal. Its subject matter is the evaluation of activities concerned with scientific research, technological development and innovation …


Why submit to Research Evaluation ?

Learn more about the submission process, our author benefits, and how you can join our prestigious author community.

Find out more

Browse high impact Market Research and Public Opinion articles from Oxford University Press Journals.

High Impact Articles

Keep up to date with the latest research your peers are reading and citing, with a collection of the most cited articles published in  Research Evaluation  in recent years.

research analysis and evaluation

The JHPS editorial board

Learn more about JHPS’ s international editorial board.


Measuring the Impact of Arts and Humanities Research in Europe

Read the latest virtual issue exploring the societal value of arts and humanities research.

Latest articles

Alerts in the Inbox

Email alerts

Register to receive table of contents email alerts as soon as new issues of Research Evaluation are published online.

Recommend to your library

Recommend to your library

Fill out our simple online form to recommend Research Evaluation to your library.

Recommend now

COPE logo

Committee on Publication Ethics (COPE)

This journal is a member of and subscribes to the principles of the Committee on Publication Ethics (COPE)

Related Titles

Cover image of current issue from Science and Public Policy

  • Recommend to your Library


  • Online ISSN 1471-5449
  • Print ISSN 0958-2029
  • Copyright © 2024 Oxford University Press
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.


Evaluation Research

Evaluation research can be defined as a type of study that uses standard social research methods specifically for evaluative purposes, perhaps to assess the results of an intervention. Did the intervention meet its goal? Were there any unanticipated consequences? Some research methods are designed to be used as evaluation tools and employ dedicated techniques to this end. These include input measurement; performance measurement; impact assessment; service quality assessment; process evaluation; benchmarking; standards; quantitative methods; qualitative methods and methods drawn from Human-Computer Interaction (Powell, 2006).

Evaluation: GO-GN Insights

research analysis and evaluation

“I think it was so highly reflexive that it could be interpreted as circular; so a disadvantage was the cycles and circles of evaluation; I was answering the research questions each time with the criteria set filters; this resulted in me writing a LOT about what the resources did according to the three set of criteria; in three cycles of evaluation and interrogation. Pedantic is the word I would use. It did have a feel of luxury to it, though; being able to really concentrate on the processes in the resources down to a granular level, to see it from a number of perspectives and try to get right down to the mechanisms that helped make the resources different and more collaborative. This ‘search for the things’ was a bit circular and I had to find the things that we also not collaborative; that’s the thing about looking for best practice; you also have to compare it to what’s ‘not good’ in the resource but also know that there are relativity issue with what ‘good’ means, and to whom. So having a bird’s eye view on who the stakeholders are is helpful; as ‘knowledge management tools,’ learning resources have agenda-pushing potential we might not recognize.”

Francisco Iniesto devised an accessibility audit and then used it to evaluate the current accessibility of MOOCs from 4 major platforms: FutureLearn, edX, Coursera and Canvas. This evaluation comprised 4 components: technical accessibility, user experience (UX), quality and learning design; 10 experts were involved in its design and validation.

“The combination of qualitative studies through interviews with MOOC providers and learners and the quantitative information provided by the MOOC survey data has provided an in-depth and multi-faceted insight into accessibility needs of MOOC learners. The MOOC accessibility audit has helped to identify accessibility barriers and the audit provides a tool that can be used and iteratively developed further to support the design and evaluation of MOOCs for accessibility. Interviews have involved MOOC providers and MOOC researchers. The aim was to explore the perspectives of platform and course developers on the importance of accessibility of the MOOC environment. The data from this study was useful to understand how to approach the next steps in this research. Interviewing individuals involved in MOOC development helped to understand how they cater for disabled learners, and the approaches they use to design accessible MOOCs. Additional evaluation involved disabled learners who had participated in learning via MOOCs. Learners were a useful source of data to explore the accessibility barriers and their solutions in using the technology and the learning designs they come up against when interacting with MOOCs. The data from the interviews helped to understand their motivations, the current accessibility barriers they have found, how they reacted to them, and their suggestions for desired solutions. Qualitative methods can help to explore a new area of research, the use of surveys in my cases helped to identify students to be interviewed to develop an understanding of their perspective on MOOCs.”

Useful references for Evaluation Research: Chang & Little (2018); Patton (2010); Powell (2006); Rutman (1977)

Research Methods Handbook Copyright © 2020 by Rob Farrow; Francisco Iniesto; Martin Weller; and Rebecca Pitt is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.

Share This Book

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

research analysis and evaluation

Home Market Research

Evaluation Research: Definition, Methods and Examples

Evaluation Research

Content Index

  • What is evaluation research
  • Why do evaluation research

Quantitative methods

Qualitative methods.

  • Process evaluation research question examples
  • Outcome evaluation research question examples

What is evaluation research?

Evaluation research, also known as program evaluation, refers to research purpose instead of a specific method. Evaluation research is the systematic assessment of the worth or merit of time, money, effort and resources spent in order to achieve a goal.

Evaluation research is closely related to but slightly different from more conventional social research . It uses many of the same methods used in traditional social research, but because it takes place within an organizational context, it requires team skills, interpersonal skills, management skills, political smartness, and other research skills that social research does not need much. Evaluation research also requires one to keep in mind the interests of the stakeholders.

Evaluation research is a type of applied research, and so it is intended to have some real-world effect.  Many methods like surveys and experiments can be used to do evaluation research. The process of evaluation research consisting of data analysis and reporting is a rigorous, systematic process that involves collecting data about organizations, processes, projects, services, and/or resources. Evaluation research enhances knowledge and decision-making, and leads to practical applications.

LEARN ABOUT: Action Research

Why do evaluation research?

The common goal of most evaluations is to extract meaningful information from the audience and provide valuable insights to evaluators such as sponsors, donors, client-groups, administrators, staff, and other relevant constituencies. Most often, feedback is perceived value as useful if it helps in decision-making. However, evaluation research does not always create an impact that can be applied anywhere else, sometimes they fail to influence short-term decisions. It is also equally true that initially, it might seem to not have any influence, but can have a delayed impact when the situation is more favorable. In spite of this, there is a general agreement that the major goal of evaluation research should be to improve decision-making through the systematic utilization of measurable feedback.

Below are some of the benefits of evaluation research

  • Gain insights about a project or program and its operations

Evaluation Research lets you understand what works and what doesn’t, where we were, where we are and where we are headed towards. You can find out the areas of improvement and identify strengths. So, it will help you to figure out what do you need to focus more on and if there are any threats to your business. You can also find out if there are currently hidden sectors in the market that are yet untapped.

  • Improve practice

It is essential to gauge your past performance and understand what went wrong in order to deliver better services to your customers. Unless it is a two-way communication, there is no way to improve on what you have to offer. Evaluation research gives an opportunity to your employees and customers to express how they feel and if there’s anything they would like to change. It also lets you modify or adopt a practice such that it increases the chances of success.

  • Assess the effects

After evaluating the efforts, you can see how well you are meeting objectives and targets. Evaluations let you measure if the intended benefits are really reaching the targeted audience and if yes, then how effectively.

  • Build capacity

Evaluations help you to analyze the demand pattern and predict if you will need more funds, upgrade skills and improve the efficiency of operations. It lets you find the gaps in the production to delivery chain and possible ways to fill them.

Methods of evaluation research

All market research methods involve collecting and analyzing the data, making decisions about the validity of the information and deriving relevant inferences from it. Evaluation research comprises of planning, conducting and analyzing the results which include the use of data collection techniques and applying statistical methods.

Some of the evaluation methods which are quite popular are input measurement, output or performance measurement, impact or outcomes assessment, quality assessment, process evaluation, benchmarking, standards, cost analysis, organizational effectiveness, program evaluation methods, and LIS-centered methods. There are also a few types of evaluations that do not always result in a meaningful assessment such as descriptive studies, formative evaluations, and implementation analysis. Evaluation research is more about information-processing and feedback functions of evaluation.

These methods can be broadly classified as quantitative and qualitative methods.

The outcome of the quantitative research methods is an answer to the questions below and is used to measure anything tangible.

  • Who was involved?
  • What were the outcomes?
  • What was the price?

The best way to collect quantitative data is through surveys , questionnaires , and polls . You can also create pre-tests and post-tests, review existing documents and databases or gather clinical data.

Surveys are used to gather opinions, feedback or ideas of your employees or customers and consist of various question types . They can be conducted by a person face-to-face or by telephone, by mail, or online. Online surveys do not require the intervention of any human and are far more efficient and practical. You can see the survey results on dashboard of research tools and dig deeper using filter criteria based on various factors such as age, gender, location, etc. You can also keep survey logic such as branching, quotas, chain survey, looping, etc in the survey questions and reduce the time to both create and respond to the donor survey . You can also generate a number of reports that involve statistical formulae and present data that can be readily absorbed in the meetings. To learn more about how research tool works and whether it is suitable for you, sign up for a free account now.

Create a free account!

Quantitative data measure the depth and breadth of an initiative, for instance, the number of people who participated in the non-profit event, the number of people who enrolled for a new course at the university. Quantitative data collected before and after a program can show its results and impact.

The accuracy of quantitative data to be used for evaluation research depends on how well the sample represents the population, the ease of analysis, and their consistency. Quantitative methods can fail if the questions are not framed correctly and not distributed to the right audience. Also, quantitative data do not provide an understanding of the context and may not be apt for complex issues.

Learn more: Quantitative Market Research: The Complete Guide

Qualitative research methods are used where quantitative methods cannot solve the research problem , i.e. they are used to measure intangible values. They answer questions such as

  • What is the value added?
  • How satisfied are you with our service?
  • How likely are you to recommend us to your friends?
  • What will improve your experience?

LEARN ABOUT: Qualitative Interview

Qualitative data is collected through observation, interviews, case studies, and focus groups. The steps for creating a qualitative study involve examining, comparing and contrasting, and understanding patterns. Analysts conclude after identification of themes, clustering similar data, and finally reducing to points that make sense.

Observations may help explain behaviors as well as the social context that is generally not discovered by quantitative methods. Observations of behavior and body language can be done by watching a participant, recording audio or video. Structured interviews can be conducted with people alone or in a group under controlled conditions, or they may be asked open-ended qualitative research questions . Qualitative research methods are also used to understand a person’s perceptions and motivations.

LEARN ABOUT:  Social Communication Questionnaire

The strength of this method is that group discussion can provide ideas and stimulate memories with topics cascading as discussion occurs. The accuracy of qualitative data depends on how well contextual data explains complex issues and complements quantitative data. It helps get the answer of “why” and “how”, after getting an answer to “what”. The limitations of qualitative data for evaluation research are that they are subjective, time-consuming, costly and difficult to analyze and interpret.

Learn more: Qualitative Market Research: The Complete Guide

Survey software can be used for both the evaluation research methods. You can use above sample questions for evaluation research and send a survey in minutes using research software. Using a tool for research simplifies the process right from creating a survey, importing contacts, distributing the survey and generating reports that aid in research.

Examples of evaluation research

Evaluation research questions lay the foundation of a successful evaluation. They define the topics that will be evaluated. Keeping evaluation questions ready not only saves time and money, but also makes it easier to decide what data to collect, how to analyze it, and how to report it.

Evaluation research questions must be developed and agreed on in the planning stage, however, ready-made research templates can also be used.

Process evaluation research question examples:

  • How often do you use our product in a day?
  • Were approvals taken from all stakeholders?
  • Can you report the issue from the system?
  • Can you submit the feedback from the system?
  • Was each task done as per the standard operating procedure?
  • What were the barriers to the implementation of each task?
  • Were any improvement areas discovered?

Outcome evaluation research question examples:

  • How satisfied are you with our product?
  • Did the program produce intended outcomes?
  • What were the unintended outcomes?
  • Has the program increased the knowledge of participants?
  • Were the participants of the program employable before the course started?
  • Do participants of the program have the skills to find a job after the course ended?
  • Is the knowledge of participants better compared to those who did not participate in the program?


A/B testing software

Top 13 A/B Testing Software for Optimizing Your Website

Apr 12, 2024

contact center experience software

21 Best Contact Center Experience Software in 2024

Government Customer Experience

Government Customer Experience: Impact on Government Service

Apr 11, 2024

Employee Engagement App

Employee Engagement App: Top 11 For Workforce Improvement 

Apr 10, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Panel on the Evaluation of AIDS Interventions; Coyle SL, Boruch RF, Turner CF, editors. Evaluating AIDS Prevention Programs: Expanded Edition. Washington (DC): National Academies Press (US); 1991.

Cover of Evaluating AIDS Prevention Programs

Evaluating AIDS Prevention Programs: Expanded Edition.

  • Hardcopy Version at National Academies Press

1 Design and Implementation of Evaluation Research

Evaluation has its roots in the social, behavioral, and statistical sciences, and it relies on their principles and methodologies of research, including experimental design, measurement, statistical tests, and direct observation. What distinguishes evaluation research from other social science is that its subjects are ongoing social action programs that are intended to produce individual or collective change. This setting usually engenders a great need for cooperation between those who conduct the program and those who evaluate it. This need for cooperation can be particularly acute in the case of AIDS prevention programs because those programs have been developed rapidly to meet the urgent demands of a changing and deadly epidemic.

Although the characteristics of AIDS intervention programs place some unique demands on evaluation, the techniques for conducting good program evaluation do not need to be invented. Two decades of evaluation research have provided a basic conceptual framework for undertaking such efforts (see, e.g., Campbell and Stanley [1966] and Cook and Campbell [1979] for discussions of outcome evaluation; see Weiss [1972] and Rossi and Freeman [1982] for process and outcome evaluations); in addition, similar programs, such as the antismoking campaigns, have been subject to evaluation, and they offer examples of the problems that have been encountered.

In this chapter the panel provides an overview of the terminology, types, designs, and management of research evaluation. The following chapter provides an overview of program objectives and the selection and measurement of appropriate outcome variables for judging the effectiveness of AIDS intervention programs. These issues are discussed in detail in the subsequent, program-specific Chapters 3 - 5 .

  • Types of Evaluation

The term evaluation implies a variety of different things to different people. The recent report of the Committee on AIDS Research and the Behavioral, Social, and Statistical Sciences defines the area through a series of questions (Turner, Miller, and Moses, 1989:317-318):

Evaluation is a systematic process that produces a trustworthy account of what was attempted and why; through the examination of results—the outcomes of intervention programs—it answers the questions, "What was done?" "To whom, and how?" and "What outcomes were observed?'' Well-designed evaluation permits us to draw inferences from the data and addresses the difficult question: ''What do the outcomes mean?"

These questions differ in the degree of difficulty of answering them. An evaluation that tries to determine the outcomes of an intervention and what those outcomes mean is a more complicated endeavor than an evaluation that assesses the process by which the intervention was delivered. Both kinds of evaluation are necessary because they are intimately connected: to establish a project's success, an evaluator must first ask whether the project was implemented as planned and then whether its objective was achieved. Questions about a project's implementation usually fall under the rubric of process evaluation . If the investigation involves rapid feedback to the project staff or sponsors, particularly at the earliest stages of program implementation, the work is called formative evaluation . Questions about effects or effectiveness are often variously called summative evaluation, impact assessment, or outcome evaluation, the term the panel uses.

Formative evaluation is a special type of early evaluation that occurs during and after a program has been designed but before it is broadly implemented. Formative evaluation is used to understand the need for the intervention and to make tentative decisions about how to implement or improve it. During formative evaluation, information is collected and then fed back to program designers and administrators to enhance program development and maximize the success of the intervention. For example, formative evaluation may be carried out through a pilot project before a program is implemented at several sites. A pilot study of a community-based organization (CBO), for example, might be used to gather data on problems involving access to and recruitment of targeted populations and the utilization and implementation of services; the findings of such a study would then be used to modify (if needed) the planned program.

Another example of formative evaluation is the use of a "story board" design of a TV message that has yet to be produced. A story board is a series of text and sketches of camera shots that are to be produced in a commercial. To evaluate the effectiveness of the message and forecast some of the consequences of actually broadcasting it to the general public, an advertising agency convenes small groups of people to react to and comment on the proposed design.

Once an intervention has been implemented, the next stage of evaluation is process evaluation, which addresses two broad questions: "What was done?" and "To whom, and how?" Ordinarily, process evaluation is carried out at some point in the life of a project to determine how and how well the delivery goals of the program are being met. When intervention programs continue over a long period of time (as is the case for some of the major AIDS prevention programs), measurements at several times are warranted to ensure that the components of the intervention continue to be delivered by the right people, to the right people, in the right manner, and at the right time. Process evaluation can also play a role in improving interventions by providing the information necessary to change delivery strategies or program objectives in a changing epidemic.

Research designs for process evaluation include direct observation of projects, surveys of service providers and clients, and the monitoring of administrative records. The panel notes that the Centers for Disease Control (CDC) is already collecting some administrative records on its counseling and testing program and community-based projects. The panel believes that this type of evaluation should be a continuing and expanded component of intervention projects to guarantee the maintenance of the projects' integrity and responsiveness to their constituencies.

The purpose of outcome evaluation is to identify consequences and to establish that consequences are, indeed, attributable to a project. This type of evaluation answers the questions, "What outcomes were observed?" and, perhaps more importantly, "What do the outcomes mean?" Like process evaluation, outcome evaluation can also be conducted at intervals during an ongoing program, and the panel believes that such periodic evaluation should be done to monitor goal achievement.

The panel believes that these stages of evaluation (i.e., formative, process, and outcome) are essential to learning how AIDS prevention programs contribute to containing the epidemic. After a body of findings has been accumulated from such evaluations, it may be fruitful to launch another stage of evaluation: cost-effectiveness analysis (see Weinstein et al., 1989). Like outcome evaluation, cost-effectiveness analysis also measures program effectiveness, but it extends the analysis by adding a measure of program cost. The panel believes that consideration of cost-effective analysis should be postponed until more experience is gained with formative, process, and outcome evaluation of the CDC AIDS prevention programs.

  • Evaluation Research Design

Process and outcome evaluations require different types of research designs, as discussed below. Formative evaluations, which are intended to both assess implementation and forecast effects, use a mix of these designs.

Process Evaluation Designs

To conduct process evaluations on how well services are delivered, data need to be gathered on the content of interventions and on their delivery systems. Suggested methodologies include direct observation, surveys, and record keeping.

Direct observation designs include case studies, in which participant-observers unobtrusively and systematically record encounters within a program setting, and nonparticipant observation, in which long, open-ended (or "focused") interviews are conducted with program participants. 1 For example, "professional customers" at counseling and testing sites can act as project clients to monitor activities unobtrusively; 2 alternatively, nonparticipant observers can interview both staff and clients. Surveys —either censuses (of the whole population of interest) or samples—elicit information through interviews or questionnaires completed by project participants or potential users of a project. For example, surveys within community-based projects can collect basic statistical information on project objectives, what services are provided, to whom, when, how often, for how long, and in what context.

Record keeping consists of administrative or other reporting systems that monitor use of services. Standardized reporting ensures consistency in the scope and depth of data collected. To use the media campaign as an example, the panel suggests using standardized data on the use of the AIDS hotline to monitor public attentiveness to the advertisements broadcast by the media campaign.

These designs are simple to understand, but they require expertise to implement. For example, observational studies must be conducted by people who are well trained in how to carry out on-site tasks sensitively and to record their findings uniformly. Observers can either complete narrative accounts of what occurred in a service setting or they can complete some sort of data inventory to ensure that multiple aspects of service delivery are covered. These types of studies are time consuming and benefit from corroboration among several observers. The use of surveys in research is well-understood, although they, too, require expertise to be well implemented. As the program chapters reflect, survey data collection must be carefully designed to reduce problems of validity and reliability and, if samples are used, to design an appropriate sampling scheme. Record keeping or service inventories are probably the easiest research designs to implement, although preparing standardized internal forms requires attention to detail about salient aspects of service delivery.

Outcome Evaluation Designs

Research designs for outcome evaluations are meant to assess principal and relative effects. Ideally, to assess the effect of an intervention on program participants, one would like to know what would have happened to the same participants in the absence of the program. Because it is not possible to make this comparison directly, inference strategies that rely on proxies have to be used. Scientists use three general approaches to construct proxies for use in the comparisons required to evaluate the effects of interventions: (1) nonexperimental methods, (2) quasi-experiments, and (3) randomized experiments. The first two are discussed below, and randomized experiments are discussed in the subsequent section.

Nonexperimental and Quasi-Experimental Designs 3

The most common form of nonexperimental design is a before-and-after study. In this design, pre-intervention measurements are compared with equivalent measurements made after the intervention to detect change in the outcome variables that the intervention was designed to influence.

Although the panel finds that before-and-after studies frequently provide helpful insights, the panel believes that these studies do not provide sufficiently reliable information to be the cornerstone for evaluation research on the effectiveness of AIDS prevention programs. The panel's conclusion follows from the fact that the postintervention changes cannot usually be attributed unambiguously to the intervention. 4 Plausible competing explanations for differences between pre-and postintervention measurements will often be numerous, including not only the possible effects of other AIDS intervention programs, news stories, and local events, but also the effects that may result from the maturation of the participants and the educational or sensitizing effects of repeated measurements, among others.

Quasi-experimental and matched control designs provide a separate comparison group. In these designs, the control group may be selected by matching nonparticipants to participants in the treatment group on the basis of selected characteristics. It is difficult to ensure the comparability of the two groups even when they are matched on many characteristics because other relevant factors may have been overlooked or mismatched or they may be difficult to measure (e.g., the motivation to change behavior). In some situations, it may simply be impossible to measure all of the characteristics of the units (e.g., communities) that may affect outcomes, much less demonstrate their comparability.

Matched control designs require extraordinarily comprehensive scientific knowledge about the phenomenon under investigation in order for evaluators to be confident that all of the relevant determinants of outcomes have been properly accounted for in the matching. Three types of information or knowledge are required: (1) knowledge of intervening variables that also affect the outcome of the intervention and, consequently, need adjustment to make the groups comparable; (2) measurements on all intervening variables for all subjects; and (3) knowledge of how to make the adjustments properly, which in turn requires an understanding of the functional relationship between the intervening variables and the outcome variables. Satisfying each of these information requirements is likely to be more difficult than answering the primary evaluation question, "Does this intervention produce beneficial effects?"

Given the size and the national importance of AIDS intervention programs and given the state of current knowledge about behavior change in general and AIDS prevention, in particular, the panel believes that it would be unwise to rely on matching and adjustment strategies as the primary design for evaluating AIDS intervention programs. With differently constituted groups, inferences about results are hostage to uncertainty about the extent to which the observed outcome actually results from the intervention and is not an artifact of intergroup differences that may not have been removed by matching or adjustment.

Randomized Experiments

A remedy to the inferential uncertainties that afflict nonexperimental designs is provided by randomized experiments . In such experiments, one singly constituted group is established for study. A subset of the group is then randomly chosen to receive the intervention, with the other subset becoming the control. The two groups are not identical, but they are comparable. Because they are two random samples drawn from the same population, they are not systematically different in any respect, which is important for all variables—both known and unknown—that can influence the outcome. Dividing a singly constituted group into two random and therefore comparable subgroups cuts through the tangle of causation and establishes a basis for the valid comparison of respondents who do and do not receive the intervention. Randomized experiments provide for clear causal inference by solving the problem of group comparability, and may be used to answer the evaluation questions "Does the intervention work?" and "What works better?"

Which question is answered depends on whether the controls receive an intervention or not. When the object is to estimate whether a given intervention has any effects, individuals are randomly assigned to the project or to a zero-treatment control group. The control group may be put on a waiting list or simply not get the treatment. This design addresses the question, "Does it work?"

When the object is to compare variations on a project—e.g., individual counseling sessions versus group counseling—then individuals are randomly assigned to these two regimens, and there is no zero-treatment control group. This design addresses the question, "What works better?" In either case, the control groups must be followed up as rigorously as the experimental groups.

A randomized experiment requires that individuals, organizations, or other treatment units be randomly assigned to one of two or more treatments or program variations. Random assignment ensures that the estimated differences between the groups so constituted are statistically unbiased; that is, that any differences in effects measured between them are a result of treatment. The absence of statistical bias in groups constituted in this fashion stems from the fact that random assignment ensures that there are no systematic differences between them, differences that can and usually do affect groups composed in ways that are not random. 5 The panel believes this approach is far superior for outcome evaluations of AIDS interventions than the nonrandom and quasi-experimental approaches. Therefore,

To improve interventions that are already broadly implemented, the panel recommends the use of randomized field experiments of alternative or enhanced interventions.

Under certain conditions, the panel also endorses randomized field experiments with a nontreatment control group to evaluate new interventions. In the context of a deadly epidemic, ethics dictate that treatment not be withheld simply for the purpose of conducting an experiment. Nevertheless, there may be times when a randomized field test of a new treatment with a no-treatment control group is worthwhile. One such time is during the design phase of a major or national intervention.

Before a new intervention is broadly implemented, the panel recommends that it be pilot tested in a randomized field experiment.

The panel considered the use of experiments with delayed rather than no treatment. A delayed-treatment control group strategy might be pursued when resources are too scarce for an intervention to be widely distributed at one time. For example, a project site that is waiting to receive funding for an intervention would be designated as the control group. If it is possible to randomize which projects in the queue receive the intervention, an evaluator could measure and compare outcomes after the experimental group had received the new treatment but before the control group received it. The panel believes that such a design can be applied only in limited circumstances, such as when groups would have access to related services in their communities and that conducting the study was likely to lead to greater access or better services. For example, a study cited in Chapter 4 used a randomized delayed-treatment experiment to measure the effects of a community-based risk reduction program. However, such a strategy may be impractical for several reasons, including:

  • sites waiting for funding for an intervention might seek resources from another source;
  • it might be difficult to enlist the nonfunded site and its clients to participate in the study;
  • there could be an appearance of favoritism toward projects whose funding was not delayed.

Although randomized experiments have many benefits, the approach is not without pitfalls. In the planning stages of evaluation, it is necessary to contemplate certain hazards, such as the Hawthorne effect 6 and differential project dropout rates. Precautions must be taken either to prevent these problems or to measure their effects. Fortunately, there is some evidence suggesting that the Hawthorne effect is usually not very large (Rossi and Freeman, 1982:175-176).

Attrition is potentially more damaging to an evaluation, and it must be limited if the experimental design is to be preserved. If sample attrition is not limited in an experimental design, it becomes necessary to account for the potentially biasing impact of the loss of subjects in the treatment and control conditions of the experiment. The statistical adjustments required to make inferences about treatment effectiveness in such circumstances can introduce uncertainties that are as worrisome as those afflicting nonexperimental and quasi-experimental designs. Thus, the panel's recommendation of the selective use of randomized design carries an implicit caveat: To realize the theoretical advantages offered by randomized experimental designs, substantial efforts will be required to ensure that the designs are not compromised by flawed execution.

Another pitfall to randomization is its appearance of unfairness or unattractiveness to participants and the controversial legal and ethical issues it sometimes raises. Often, what is being criticized is the control of project assignment of participants rather than the use of randomization itself. In deciding whether random assignment is appropriate, it is important to consider the specific context of the evaluation and how participants would be assigned to projects in the absence of randomization. The Federal Judicial Center (1981) offers five threshold conditions for the use of random assignment.

  • Does present practice or policy need improvement?
  • Is there significant uncertainty about the value of the proposed regimen?
  • Are there acceptable alternatives to randomized experiments?
  • Will the results of the experiment be used to improve practice or policy?
  • Is there a reasonable protection against risk for vulnerable groups (i.e., individuals within the justice system)?

The parent committee has argued that these threshold conditions apply in the case of AIDS prevention programs (see Turner, Miller, and Moses, 1989:331-333).

Although randomization may be desirable from an evaluation and ethical standpoint, and acceptable from a legal standpoint, it may be difficult to implement from a practical or political standpoint. Again, the panel emphasizes that questions about the practical or political feasibility of the use of randomization may in fact refer to the control of program allocation rather than to the issues of randomization itself. In fact, when resources are scarce, it is often more ethical and politically palatable to randomize allocation rather than to allocate on grounds that may appear biased.

It is usually easier to defend the use of randomization when the choice has to do with assignment to groups receiving alternative services than when the choice involves assignment to groups receiving no treatment. For example, in comparing a testing and counseling intervention that offered a special "skills training" session in addition to its regular services with a counseling and testing intervention that offered no additional component, random assignment of participants to one group rather than another may be acceptable to program staff and participants because the relative values of the alternative interventions are unknown.

The more difficult issue is the introduction of new interventions that are perceived to be needed and effective in a situation in which there are no services. An argument that is sometimes offered against the use of randomization in this instance is that interventions should be assigned on the basis of need (perhaps as measured by rates of HIV incidence or of high-risk behaviors). But this argument presumes that the intervention will have a positive effect—which is unknown before evaluation—and that relative need can be established, which is a difficult task in itself.

The panel recognizes that community and political opposition to randomization to zero treatments may be strong and that enlisting participation in such experiments may be difficult. This opposition and reluctance could seriously jeopardize the production of reliable results if it is translated into noncompliance with a research design. The feasibility of randomized experiments for AIDS prevention programs has already been demonstrated, however (see the review of selected experiments in Turner, Miller, and Moses, 1989:327-329). The substantial effort involved in mounting randomized field experiments is repaid by the fact that they can provide unbiased evidence of the effects of a program.

Unit of Assignment.

The unit of assignment of an experiment may be an individual person, a clinic (i.e., the clientele of the clinic), or another organizational unit (e.g., the community or city). The treatment unit is selected at the earliest stage of design. Variations of units are illustrated in the following four examples of intervention programs.

Two different pamphlets (A and B) on the same subject (e.g., testing) are distributed in an alternating sequence to individuals calling an AIDS hotline. The outcome to be measured is whether the recipient returns a card asking for more information.

Two instruction curricula (A and B) about AIDS and HIV infections are prepared for use in high school driver education classes. The outcome to be measured is a score on a knowledge test.

Of all clinics for sexually transmitted diseases (STDs) in a large metropolitan area, some are randomly chosen to introduce a change in the fee schedule. The outcome to be measured is the change in patient load.

A coordinated set of community-wide interventions—involving community leaders, social service agencies, the media, community associations and other groups—is implemented in one area of a city. Outcomes are knowledge as assessed by testing at drug treatment centers and STD clinics and condom sales in the community's retail outlets.

In example (1), the treatment unit is an individual person who receives pamphlet A or pamphlet B. If either "treatment" is applied again, it would be applied to a person. In example (2), the high school class is the treatment unit; everyone in a given class experiences either curriculum A or curriculum B. If either treatment is applied again, it would be applied to a class. The treatment unit is the clinic in example (3), and in example (4), the treatment unit is a community .

The consistency of the effects of a particular intervention across repetitions justly carries a heavy weight in appraising the intervention. It is important to remember that repetitions of a treatment or intervention are the number of treatment units to which the intervention is applied. This is a salient principle in the design and execution of intervention programs as well as in the assessment of their results.

The adequacy of the proposed sample size (number of treatment units) has to be considered in advance. Adequacy depends mainly on two factors:

  • How much variation occurs from unit to unit among units receiving a common treatment? If that variation is large, then the number of units needs to be large.
  • What is the minimum size of a possible treatment difference that, if present, would be practically important? That is, how small a treatment difference is it essential to detect if it is present? The smaller this quantity, the larger the number of units that are necessary.

Many formal methods for considering and choosing sample size exist (see, e.g., Cohen, 1988). Practical circumstances occasionally allow choosing between designs that involve units at different levels; thus, a classroom might be the unit if the treatment is applied in one way, but an entire school might be the unit if the treatment is applied in another. When both approaches are feasible, the use of a power analysis for each approach may lead to a reasoned choice.

Choice of Methods

There is some controversy about the advantages of randomized experiments in comparison with other evaluative approaches. It is the panel's belief that when a (well executed) randomized study is feasible, it is superior to alternative kinds of studies in the strength and clarity of whatever conclusions emerge, primarily because the experimental approach avoids selection biases. 7 Other evaluation approaches are sometimes unavoidable, but ordinarily the accumulation of valid information will go more slowly and less securely than in randomized approaches.

Experiments in medical research shed light on the advantages of carefully conducted randomized experiments. The Salk vaccine trials are a successful example of a large, randomized study. In a double-blind test of the polio vaccine, 8 children in various communities were randomly assigned to two treatments, either the vaccine or a placebo. By this method, the effectiveness of Salk vaccine was demonstrated in one summer of research (Meier, 1957).

A sufficient accumulation of relevant, observational information, especially when collected in studies using different procedures and sample populations, may also clearly demonstrate the effectiveness of a treatment or intervention. The process of accumulating such information can be a long one, however. When a (well-executed) randomized study is feasible, it can provide evidence that is subject to less uncertainty in its interpretation, and it can often do so in a more timely fashion. In the midst of an epidemic, the panel believes it proper that randomized experiments be one of the primary strategies for evaluating the effectiveness of AIDS prevention efforts. In making this recommendation, however, the panel also wishes to emphasize that the advantages of the randomized experimental design can be squandered by poor execution (e.g., by compromised assignment of subjects, significant subject attrition rates, etc.). To achieve the advantages of the experimental design, care must be taken to ensure that the integrity of the design is not compromised by poor execution.

In proposing that randomized experiments be one of the primary strategies for evaluating the effectiveness of AIDS prevention programs, the panel also recognizes that there are situations in which randomization will be impossible or, for other reasons, cannot be used. In its next report the panel will describe at length appropriate nonexperimental strategies to be considered in situations in which an experiment is not a practical or desirable alternative.

  • The Management of Evaluation

Conscientious evaluation requires a considerable investment of funds, time, and personnel. Because the panel recognizes that resources are not unlimited, it suggests that they be concentrated on the evaluation of a subset of projects to maximize the return on investment and to enhance the likelihood of high-quality results.

Project Selection

Deciding which programs or sites to evaluate is by no means a trivial matter. Selection should be carefully weighed so that projects that are not replicable or that have little chance for success are not subjected to rigorous evaluations.

The panel recommends that any intensive evaluation of an intervention be conducted on a subset of projects selected according to explicit criteria. These criteria should include the replicability of the project, the feasibility of evaluation, and the project's potential effectiveness for prevention of HIV transmission.

If a project is replicable, it means that the particular circumstances of service delivery in that project can be duplicated. In other words, for CBOs and counseling and testing projects, the content and setting of an intervention can be duplicated across sites. Feasibility of evaluation means that, as a practical matter, the research can be done: that is, the research design is adequate to control for rival hypotheses, it is not excessively costly, and the project is acceptable to the community and the sponsor. Potential effectiveness for HIV prevention means that the intervention is at least based on a reasonable theory (or mix of theories) about behavioral change (e.g., social learning theory [Bandura, 1977], the health belief model [Janz and Becker, 1984], etc.), if it has not already been found to be effective in related circumstances.

In addition, since it is important to ensure that the results of evaluations will be broadly applicable,

The panel recommends that evaluation be conducted and replicated across major types of subgroups, programs, and settings. Attention should be paid to geographic areas with low and high AIDS prevalence, as well as to subpopulations at low and high risk for AIDS.

Research Administration

The sponsoring agency interested in evaluating an AIDS intervention should consider the mechanisms through which the research will be carried out as well as the desirability of both independent oversight and agency in-house conduct and monitoring of the research. The appropriate entities and mechanisms for conducting evaluations depend to some extent on the kinds of data being gathered and the evaluation questions being asked.

Oversight and monitoring are important to keep projects fully informed about the other evaluations relevant to their own and to render assistance when needed. Oversight and monitoring are also important because evaluation is often a sensitive issue for project and evaluation staff alike. The panel is aware that evaluation may appear threatening to practitioners and researchers because of the possibility that evaluation research will show that their projects are not as effective as they believe them to be. These needs and vulnerabilities should be taken into account as evaluation research management is developed.

Conducting the Research

To conduct some aspects of a project's evaluation, it may be appropriate to involve project administrators, especially when the data will be used to evaluate delivery systems (e.g., to determine when and which services are being delivered). To evaluate outcomes, the services of an outside evaluator 9 or evaluation team are almost always required because few practitioners have the necessary professional experience or the time and resources necessary to do evaluation. The outside evaluator must have relevant expertise in evaluation research methodology and must also be sensitive to the fears, hopes, and constraints of project administrators.

Several evaluation management schemes are possible. For example, a prospective AIDS prevention project group (the contractor) can bid on a contract for project funding that includes an intensive evaluation component. The actual evaluation can be conducted either by the contractor alone or by the contractor working in concert with an outside independent collaborator. This mechanism has the advantage of involving project practitioners in the work of evaluation as well as building separate but mutually informing communities of experts around the country. Alternatively, a contract can be let with a single evaluator or evaluation team that will collaborate with the subset of sites that is chosen for evaluation. This variation would be managerially less burdensome than awarding separate contracts, but it would require greater dependence on the expertise of a single investigator or investigative team. ( Appendix A discusses contracting options in greater depth.) Both of these approaches accord with the parent committee's recommendation that collaboration between practitioners and evaluation researchers be ensured. Finally, in the more traditional evaluation approach, independent principal investigators or investigative teams may respond to a request for proposal (RFP) issued to evaluate individual projects. Such investigators are frequently university-based or are members of a professional research organization, and they bring to the task a variety of research experiences and perspectives.

Independent Oversight

The panel believes that coordination and oversight of multisite evaluations is critical because of the variability in investigators' expertise and in the results of the projects being evaluated. Oversight can provide quality control for individual investigators and can be used to review and integrate findings across sites for developing policy. The independence of an oversight body is crucial to ensure that project evaluations do not succumb to the pressures for positive findings of effectiveness.

When evaluation is to be conducted by a number of different evaluation teams, the panel recommends establishing an independent scientific committee to oversee project selection and research efforts, corroborate the impartiality and validity of results, conduct cross-site analyses, and prepare reports on the progress of the evaluations.

The composition of such an independent oversight committee will depend on the research design of a given program. For example, the committee ought to include statisticians and other specialists in randomized field tests when that approach is being taken. Specialists in survey research and case studies should be recruited if either of those approaches is to be used. Appendix B offers a model for an independent oversight group that has been successfully implemented in other settings—a project review team, or advisory board.

Agency In-House Team

As the parent committee noted in its report, evaluations of AIDS interventions require skills that may be in short supply for agencies invested in delivering services (Turner, Miller, and Moses, 1989:349). Although this situation can be partly alleviated by recruiting professional outside evaluators and retaining an independent oversight group, the panel believes that an in-house team of professionals within the sponsoring agency is also critical. The in-house experts will interact with the outside evaluators and provide input into the selection of projects, outcome objectives, and appropriate research designs; they will also monitor the progress and costs of evaluation. These functions require not just bureaucratic oversight but appropriate scientific expertise.

This is not intended to preclude the direct involvement of CDC staff in conducting evaluations. However, given the great amount of work to be done, it is likely a considerable portion will have to be contracted out. The quality and usefulness of the evaluations done under contract can be greatly enhanced by ensuring that there are an adequate number of CDC staff trained in evaluation research methods to monitor these contracts.

The panel recommends that CDC recruit and retain behavioral, social, and statistical scientists trained in evaluation methodology to facilitate the implementation of the evaluation research recommended in this report.

Interagency Collaboration

The panel believes that the federal agencies that sponsor the design of basic research, intervention programs, and evaluation strategies would profit from greater interagency collaboration. The evaluation of AIDS intervention programs would benefit from a coherent program of studies that should provide models of efficacious and effective interventions to prevent further HIV transmission, the spread of other STDs, and unwanted pregnancies (especially among adolescents). A marriage could then be made of basic and applied science, from which the best evaluation is born. Exploring the possibility of interagency collaboration and CDC's role in such collaboration is beyond the scope of this panel's task, but it is an important issue that we suggest be addressed in the future.

Costs of Evaluation

In view of the dearth of current evaluation efforts, the panel believes that vigorous evaluation research must be undertaken over the next few years to build up a body of knowledge about what interventions can and cannot do. Dedicating no resources to evaluation will virtually guarantee that high-quality evaluations will be infrequent and the data needed for policy decisions will be sparse or absent. Yet, evaluating every project is not feasible simply because there are not enough resources and, in many cases, evaluating every project is not necessary for good science or good policy.

The panel believes that evaluating only some of a program's sites or projects, selected under the criteria noted in Chapter 4 , is a sensible strategy. Although we recommend that intensive evaluation be conducted on only a subset of carefully chosen projects, we believe that high-quality evaluation will require a significant investment of time, planning, personnel, and financial support. The panel's aim is to be realistic—not discouraging—when it notes that the costs of program evaluation should not be underestimated. Many of the research strategies proposed in this report require investments that are perhaps greater than has been previously contemplated. This is particularly the case for outcome evaluations, which are ordinarily more difficult and expensive to conduct than formative or process evaluations. And those costs will be additive with each type of evaluation that is conducted.

Panel members have found that the cost of an outcome evaluation sometimes equals or even exceeds the cost of actual program delivery. For example, it was reported to the panel that randomized studies used to evaluate recent manpower training projects cost as much as the projects themselves (see Cottingham and Rodriguez, 1987). In another case, the principal investigator of an ongoing AIDS prevention project told the panel that the cost of randomized experimentation was approximately three times higher than the cost of delivering the intervention (albeit the study was quite small, involving only 104 participants) (Kelly et al., 1989). Fortunately, only a fraction of a program's projects or sites need to be intensively evaluated to produce high-quality information, and not all will require randomized studies.

Because of the variability in kinds of evaluation that will be done as well as in the costs involved, there is no set standard or rule for judging what fraction of a total program budget should be invested in evaluation. Based upon very limited data 10 and assuming that only a small sample of projects would be evaluated, the panel suspects that program managers might reasonably anticipate spending 8 to 12 percent of their intervention budgets to conduct high-quality evaluations (i.e., formative, process, and outcome evaluations). 11 Larger investments seem politically infeasible and unwise in view of the need to put resources into program delivery. Smaller investments in evaluation may risk studying an inadequate sample of program types, and it may also invite compromises in research quality.

The nature of the HIV/AIDS epidemic mandates an unwavering commitment to prevention programs, and the prevention activities require a similar commitment to the evaluation of those programs. The magnitude of what can be learned from doing good evaluations will more than balance the magnitude of the costs required to perform them. Moreover, it should be realized that the costs of shoddy research can be substantial, both in their direct expense and in the lost opportunities to identify effective strategies for AIDS prevention. Once the investment has been made, however, and a reservoir of findings and practical experience has accumulated, subsequent evaluations should be easier and less costly to conduct.

  • Bandura, A. (1977) Self-efficacy: Toward a unifying theory of behavioral change . Psychological Review 34:191-215. [ PubMed : 847061 ]
  • Campbell, D. T., and Stanley, J. C. (1966) Experimental and Quasi-Experimental Design and Analysis . Boston: Houghton-Mifflin.
  • Centers for Disease Control (CDC) (1988) Sourcebook presented at the National Conference on the Prevention of HIV Infection and AIDS Among Racial and Ethnic Minorities in the United States (August).
  • Cohen, J. (1988) Statistical Power Analysis for the Behavioral Sciences . 2nd ed. Hillsdale, NJ.: L. Erlbaum Associates.
  • Cook, T., and Campbell, D. T. (1979) Quasi-Experimentation: Design and Analysis for Field Settings . Boston: Houghton-Mifflin.
  • Federal Judicial Center (1981) Experimentation in the Law . Washington, D.C.: Federal Judicial Center.
  • Janz, N. K., and Becker, M. H. (1984) The health belief model: A decade later . Health Education Quarterly 11 (1):1-47. [ PubMed : 6392204 ]
  • Kelly, J. A., St. Lawrence, J. S., Hood, H. V., and Brasfield, T. L. (1989) Behavioral intervention to reduce AIDS risk activities . Journal of Consulting and Clinical Psychology 57:60-67. [ PubMed : 2925974 ]
  • Meier, P. (1957) Safety testing of poliomyelitis vaccine . Science 125(3257): 1067-1071. [ PubMed : 13432758 ]
  • Roethlisberger, F. J. and Dickson, W. J. (1939) Management and the Worker . Cambridge, Mass.: Harvard University Press.
  • Rossi, P. H., and Freeman, H. E. (1982) Evaluation: A Systematic Approach . 2nd ed. Beverly Hills, Cal.: Sage Publications.
  • Turner, C. F., editor; , Miller, H. G., editor; , and Moses, L. E., editor. , eds. (1989) AIDS, Sexual Behavior, and Intravenous Drug Use . Report of the NRC Committee on AIDS Research and the Behavioral, Social, and Statistical Sciences. Washington, D.C.: National Academy Press. [ PubMed : 25032322 ]
  • Weinstein, M. C., Graham, J. D., Siegel, J. E., and Fineberg, H. V. (1989) Cost-effectiveness analysis of AIDS prevention programs: Concepts, complications, and illustrations . In C.F. Turner, editor; , H. G. Miller, editor; , and L. E. Moses, editor. , eds., AIDS, Sexual Behavior, and Intravenous Drug Use . Report of the NRC Committee on AIDS Research and the Behavioral, Social, and Statistical Sciences. Washington, D.C.: National Academy Press. [ PubMed : 25032322 ]
  • Weiss, C. H. (1972) Evaluation Research . Englewood Cliffs, N.J.: Prentice-Hall, Inc.

On occasion, nonparticipants observe behavior during or after an intervention. Chapter 3 introduces this option in the context of formative evaluation.

The use of professional customers can raise serious concerns in the eyes of project administrators at counseling and testing sites. The panel believes that site administrators should receive advance notification that professional customers may visit their sites for testing and counseling services and provide their consent before this method of data collection is used.

Parts of this section are adopted from Turner, Miller, and Moses, (1989:324-326).

This weakness has been noted by CDC in a sourcebook provided to its HIV intervention project grantees (CDC, 1988:F-14).

The significance tests applied to experimental outcomes calculate the probability that any observed differences between the sample estimates might result from random variations between the groups.

Research participants' knowledge that they were being observed had a positive effect on their responses in a series of famous studies made at General Electric's Hawthorne Works in Chicago (Roethlisberger and Dickson, 1939); the phenomenon is referred to as the Hawthorne effect.

participants who self-select into a program are likely to be different from non-random comparison groups in terms of interests, motivations, values, abilities, and other attributes that can bias the outcomes.

A double-blind test is one in which neither the person receiving the treatment nor the person administering it knows which treatment (or when no treatment) is being given.

As discussed under ''Agency In-House Team,'' the outside evaluator might be one of CDC's personnel. However, given the large amount of research to be done, it is likely that non-CDC evaluators will also need to be used.

See, for example, chapter 3 which presents cost estimates for evaluations of media campaigns. Similar estimates are not readily available for other program types.

For example, the U. K. Health Education Authority (that country's primary agency for AIDS education and prevention programs) allocates 10 percent of its AIDS budget for research and evaluation of its AIDS programs (D. McVey, Health Education Authority, personal communication, June 1990). This allocation covers both process and outcome evaluation.

  • Cite this Page National Research Council (US) Panel on the Evaluation of AIDS Interventions; Coyle SL, Boruch RF, Turner CF, editors. Evaluating AIDS Prevention Programs: Expanded Edition. Washington (DC): National Academies Press (US); 1991. 1, Design and Implementation of Evaluation Research.
  • PDF version of this title (6.0M)

In this Page

Related information.

  • PubMed Links to PubMed

Recent Activity

  • Design and Implementation of Evaluation Research - Evaluating AIDS Prevention Pr... Design and Implementation of Evaluation Research - Evaluating AIDS Prevention Programs

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers


Research Analysis & Evaluation

International Double Blind Peer Reviewed, Refereed, Multilingual,Multidisciplinary & Indexed- Monthly Research Journal

  • ISSN(E) : 2320-5482 RNI : RAJBIL2009/30097
  • Impact Factor : 6.376 (SJIF)

Submit Online Paper R

    I hereby declare, on behalf of myself and my co-authors (if any), that:

      This Paper is Originally written by me/us and I /we are fully responsible for this paper            contents. [1]     I/we have taken due care that the scientific knowledge and all other statements contained in the research paper  conform to true  facts and authentic formulae and will not, if followed precisely, be detrimental to the user. [2]     No responsibility is assumed by RESEARCH ANALYSIS & EVALUATION and the Publisher of RESEARCH ANALYSIS &  EVALUATION, its staff or members or the editorial board for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products  instruction, advertisements or ideas contained in a publication by  RESEARCH ANALYSIS & EVALUATION  and by the Publisher of RESEARCH  ANALYSIS & EVALUATION [3]     I/we permit the adaptation, preparation of derivative works, oral presentation or distribution, along with the commercial application             of the work. [4]     The research paper contains no such material that may be unlawful, defamatory, or which would, if published, in any way  whatsoever, violate the terms and conditions as laid down in the agreement [5]    The research paper submitted is an original work of mine/ours and has neither been published in any other peer-reviewed journal/   news paper/magazine/periodical/book nor is under consideration for publication by any of them. Also, the research paper does not contravene any existing copyright or any other third party rights. [6]     I am/we are the sole author(s) of the research paper and maintain the authority to enter into this agreement and the granting of  rights to The Publisher of RESEARCH ANALYSIS & EVALUATION, Jaipur, India and this does not  infringe any clause of this agreement. COPYRIGHT TRANSFER         Copyright to the above work (including without limitation, the right to publish the work in whole, or in part, in any and all forms) is here by transferred to RESEARCH ANALYSIS & EVALUATION,Jaipur and to the Publisher of RESEARCH ANALYSIS & EVALUATION, JAIPUR, India, to ensure widest dissemination. No proprietary right other than copyright is proclaimed by  RESEARCH ANALYSIS & EVALUATION and the Publisher of  RESEARCH ANALYSIS & EVALUATION Under the Following Conditions: Attribution :(i) The services of the original author must be acknowledged; (ii). In case of reuse or distribution, the agreement conditions must be clarified to the user of this work; (iii) Any of these conditions can be ignored on the consent of the author. SIGN HERE FOR COPYRIGHT AGREEMENT & COPY RIGHT TRANSFER  AGREEMENT : I hereby certify that I am authorized to sign this document either in my own right or as an agent of my employer, and have made no changes to the current valid document supplied byRESEARCH ANALYSIS & EVALUATION and the Publisher of  RESEARCH  ANALYSIS & EVALUATION


Helpful links.

  • ugc journal

A-215, Moti Nagar,Street No-7 , Queens Road , Jaipur-302021, Rajasthan.

© Copyright 2016. Designed by: Corporate1 Software



Toward a framework for selecting indicators of measuring sustainability and circular economy in the agri-food sector: a systematic literature review

  • Published: 02 March 2022

Cite this article

  • Cecilia Silvestri   ORCID: 1 ,
  • Luca Silvestri   ORCID: 2 ,
  • Michela Piccarozzi   ORCID: 1 &
  • Alessandro Ruggieri 1  

2853 Accesses

11 Citations

9 Altmetric

Explore all metrics

A Correction to this article was published on 24 March 2022

This article has been updated

The implementation of sustainability and circular economy (CE) models in agri-food production can promote resource efficiency, reduce environmental burdens, and ensure improved and socially responsible systems. In this context, indicators for the measurement of sustainability play a crucial role. Indicators can measure CE strategies aimed to preserve functions, products, components, materials, or embodied energy. Although there is broad literature describing sustainability and CE indicators, no study offers such a comprehensive framework of indicators for measuring sustainability and CE in the agri-food sector.

Starting from this central research gap, a systematic literature review has been developed to measure the sustainability in the agri-food sector and, based on these findings, to understand how indicators are used and for which specific purposes.

The analysis of the results allowed us to classify the sample of articles in three main clusters (“Assessment-LCA,” “Best practice,” and “Decision-making”) and has shown increasing attention to the three pillars of sustainability (triple bottom line). In this context, an integrated approach of indicators (environmental, social, and economic) offers the best solution to ensure an easier transition to sustainability.


The sample analysis facilitated the identification of new categories of impact that deserve attention, such as the cooperation among stakeholders in the supply chain and eco-innovation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

research analysis and evaluation

Source: Authors’ elaboration. Notes: The graph shows the temporal distribution of the articles under analysis

research analysis and evaluation

Source: Authors’ elaborations. Notes: The graph shows the time distribution of articles from the three major journals

research analysis and evaluation

Source: Authors’ elaboration. Notes: The graph shows the composition of the sample according to the three clusters identified by the analysis

research analysis and evaluation

Source: Authors’ elaboration. Notes: The graph shows the distribution of articles over time by cluster

research analysis and evaluation

Source: Authors’ elaboration. Notes: The graph shows the network visualization

research analysis and evaluation

Source: Authors’ elaboration. Notes: The graph shows the overlay visualization

research analysis and evaluation

Source: Authors’ elaboration. Notes: The graph shows the classification of articles by scientific field

research analysis and evaluation

Source: Authors’ elaboration. Notes: Article classification based on their cluster to which they belong and scientific field

research analysis and evaluation

Source: Authors’ elaboration

research analysis and evaluation

Source: Authors’ elaboration. Notes: The graph shows the distribution of items over time based on TBL

research analysis and evaluation

Source: Authors’ elaboration. Notes: The graph shows the Pareto diagram highlighting the most used indicators in literature for measuring sustainability in the agri-food sector

research analysis and evaluation

Source: Authors’ elaboration. Notes: The graph shows the distribution over time of articles divided into conceptual and empirical

research analysis and evaluation

Source: Authors’ elaboration. Notes: The graph shows the classification of articles, divided into conceptual and empirical, in-depth analysis

research analysis and evaluation

Source: Authors’ elaboration. Notes: The graph shows the geographical distribution of the authors

research analysis and evaluation

Source: Authors’ elaboration. Notes: The graph shows the distribution of authors according to the continent from which they originate

research analysis and evaluation

Source: Authors’ elaboration. Notes: The graph shows the time distribution of publication of authors according to the continent from which they originate

research analysis and evaluation

Source: Authors’ elaboration. Notes: Sustainability measurement indicators and impact categories of LCA, S-LCA, and LCC tools should be integrated in order to provide stakeholders with best practices as guidelines and tools to support both decision-making and measurement, according to the circular economy approach

Similar content being viewed by others

research analysis and evaluation

Common Methods and Sustainability Indicators

research analysis and evaluation

Transition heuristic frameworks in research on agro-food sustainability transitions

Hamid El Bilali

research analysis and evaluation

Research on agro-food sustainability transitions: where are food security and nutrition?

Change history, 24 march 2022.

A Correction to this paper has been published:

Acero AP, Rodriguez C, Ciroth A (2017) LCIA methods: impact assessment methods in life cycle assessment and their impact categories. Version 1.5.6. Green Delta 1–23

Accorsi R, Versari L, Manzini R (2015) Glass vs. plastic: Life cycle assessment of extra-virgin olive oil bottles across global supply chains. Sustain 7:2818–2840.

Adjei-Bamfo P, Maloreh-Nyamekye T, Ahenkan A (2019) The role of e-government in sustainable public procurement in developing countries: a systematic literature review. Resour Conserv Recycl 142:189–203.

Article   Google Scholar  

Aivazidou E, Tsolakis N, Vlachos D, Iakovou E (2015) Water footprint management policies for agrifood supply chains: a critical taxonomy and a system dynamics modelling approach. Chem Eng Trans 43:115–120.

Alhaddi H (2015) Triple bottom line and sustainability: a literature review. Bus Manag Stud 1:6–10

Allaoui H, Guo Y, Sarkis J (2019) Decision support for collaboration planning in sustainable supply chains. J Clean Prod 229:761–774.

Alshqaqeeq F, Amin Esmaeili M, Overcash M, Twomey J (2020) Quantifying hospital services by carbon footprint: a systematic literature review of patient care alternatives. Resour Conserv Recycl 154:104560.

Anwar F, Chaudhry FN, Nazeer S et al (2016) Causes of ozone layer depletion and its effects on human: review. Atmos Clim Sci 06:129–134.

Aquilani B, Silvestri C, Ruggieri A (2016). A Systematic Literature Review on Total Quality Management Critical Success Factors and the Identification of New Avenues of Research.

Aramyan L, Hoste R, Van Den Broek W et al (2011) Towards sustainable food production: a scenario study of the European pork sector. J Chain Netw Sci 11:177–189.

Arfini F, Antonioli F, Cozzi E et al (2019) Sustainability, innovation and rural development: the case of Parmigiano-Reggiano PDO. Sustain 11:1–17.

Assembly UG (2005) Resolution adopted by the general assembly. New York, NY

Avilés-Palacios C, Rodríguez-Olalla A (2021) The sustainability of waste management models in circular economies. Sustain 13:1–19.

Azevedo SG, Silva ME, Matias JCO, Dias GP (2018) The influence of collaboration initiatives on the sustainability of the cashew supply chain. Sustain 10:1–29.

Bajaj S, Garg R, Sethi M (2016) Total quality management: a critical literature review using Pareto analysis. Int J Product Perform Manag 67:128–154

Banasik A, Kanellopoulos A, Bloemhof-Ruwaard JM, Claassen GDH (2019) Accounting for uncertainty in eco-efficient agri-food supply chains: a case study for mushroom production planning. J Clean Prod 216:249–256.

Barth H, Ulvenblad PO, Ulvenblad P (2017) Towards a conceptual framework of sustainable business model innovation in the agri-food sector: a systematic literature review. Sustain 9.

Bastas A, Liyanage K (2018) Sustainable supply chain quality management: a systematic review

Beckerman W (1992) Economic growth and the environment: whose growth? Whose environment? World Dev 20:481–496.

Belaud JP, Prioux N, Vialle C, Sablayrolles C (2019) Big data for agri-food 4.0: application to sustainability management for by-products supply chain. Comput Ind 111:41–50.

Bele B, Norderhaug A, Sickel H (2018) Localized agri-food systems and biodiversity. Agric 8.

Bilali H El, Calabrese G, Iannetta M et al (2020) Environmental sustainability of typical agro-food products: a scientifically sound and user friendly approach. New Medit 19:69–83.

Blanc S, Massaglia S, Brun F et al (2019) Use of bio-based plastics in the fruit supply chain: an integrated approach to assess environmental, economic, and social sustainability. Sustain 11.

Bloemhof JM, van der Vorst JGAJ, Bastl M, Allaoui H (2015) Sustainability assessment of food chain logistics. Int J Logist Res Appl 18:101–117.

Bonisoli L, Galdeano-Gómez E, Piedra-Muñoz L (2018) Deconstructing criteria and assessment tools to build agri-sustainability indicators and support farmers’ decision-making process. J Clean Prod 182:1080–1094.

Bonisoli L, Galdeano-Gómez E, Piedra-Muñoz L, Pérez-Mesa JC (2019) Benchmarking agri-food sustainability certifications: evidences from applying SAFA in the Ecuadorian banana agri-system. J Clean Prod 236.

Bornmann L, Haunschild R, Hug SE (2018) Visualizing the context of citations referencing papers published by Eugene Garfield: a new type of keyword co-occurrence analysis. Scientometrics 114:427–437.

Boulding KE (1966) The economics of the coming spaceship earth. New York, 1-17

Bracquené E, Dewulf W, Duflou JR (2020) Measuring the performance of more circular complex product supply chains. Resour Conserv Recycl 154:104608.

Burck J, Hagen U, Bals C et al (2021) Climate Change Performance Index

Calisto Friant M, Vermeulen WJV, Salomone R (2020) A typology of circular economy discourses: navigating the diverse visions of a contested paradigm. Resour Conserv Recycl 161:104917.

Campbell BM, Beare DJ, Bennett EM et al (2017) Agriculture production as a major driver of the earth system exceeding planetary boundaries. Ecol Soc 22.

Capitanio F, Coppola A, Pascucci S (2010) Product and process innovation in the Italian food industry. Agribusiness 26:503–518.

Caputo P, Zagarella F, Cusenza MA et al (2020) Energy-environmental assessment of the UIA-OpenAgri case study as urban regeneration project through agriculture. Sci Total Environ 729:138819.

Article   CAS   Google Scholar  

Chabowski BR, Mena JA, Gonzalez-Padron TL (2011) The structure of sustainability research in marketing, 1958–2008: a basis for future research opportunities. J Acad Mark Sci 39:55–70.

Chadegani AA, Salehi H, Yunus M et al (2017) A comparison between two main academic literature collections : Web of Science and Scopus databases. Asian Soc Sci 9:18–26.

Chams N, Guesmi B, Gil JM (2020) Beyond scientific contribution: assessment of the societal impact of research and innovation to build a sustainable agri-food sector. J Environ Manage 264.

Chandrakumar C, McLaren SJ, Jayamaha NP, Ramilan T (2019) Absolute sustainability-based life cycle assessment (ASLCA): a benchmarking approach to operate agri-food systems within the 2°C global carbon budget. J Ind Ecol 23:906–917.

Chaparro-Africano AM (2019) Toward generating sustainability indicators for agroecological markets. Agroecol Sustain Food Syst 43:40–66.

Colicchia C, Strozzi F (2012) Supply chain risk management: a new methodology for a systematic literature review

Conca L, Manta F, Morrone D, Toma P (2021) The impact of direct environmental, social, and governance reporting: empirical evidence in European-listed companies in the agri-food sector. Bus Strateg Environ 30:1080–1093.

Coppola A, Ianuario S, Romano S, Viccaro M (2020) Corporate social responsibility in agri-food firms: the relationship between CSR actions and firm’s performance. AIMS Environ Sci 7:542–558.

Corona B, Shen L, Reike D et al (2019) Towards sustainable development through the circular economy—a review and critical assessment on current circularity metrics. Resour Conserv Recycl 151:104498.

Correia MS (2019) Sustainability: An overview of the triple bottom line and sustainability implementation. Int J Strateg Eng 2:29–38.

Coteur I, Marchand F, Debruyne L, Lauwers L (2019) Structuring the myriad of sustainability assessments in agri-food systems: a case in Flanders. J Clean Prod 209:472–480.

CREA (2020) L’agricoltura italiana conta 2019

Crenna E, Sala S, Polce C, Collina E (2017) Pollinators in life cycle assessment: towards a framework for impact assessment. J Clean Prod 140:525–536.

D’Eusanio M, Serreli M, Zamagni A, Petti L (2018) Assessment of social dimension of a jar of honey: a methodological outline. J Clean Prod 199:503–517.

Dania WAP, Xing K, Amer Y (2018) Collaboration behavioural factors for sustainable agri-food supply chains: a systematic review. J Clean Prod 186:851–864

De Pascale A, Arbolino R, Szopik-Depczyńska K et al (2021) A systematic review for measuring circular economy: the 61 indicators. J Clean Prod 281.

De Schoenmakere M, Gillabel J (2017) Circular by design: products in the circular economy

Del Borghi A, Gallo M, Strazza C, Del Borghi M (2014) An evaluation of environmental sustainability in the food industry through life cycle assessment: the case study of tomato products supply chain. J Clean Prod 78:121–130.

Del Borghi A, Strazza C, Magrassi F et al (2018) Life cycle assessment for eco-design of product–package systems in the food industry—the case of legumes. Sustain Prod Consum 13:24–36.

Denyer D, Tranfield D (2009) Producing a systematic review. In: Buchanan B (ed) The sage handbook of organization research methods. Sage Publications Ltd, Cornwall, pp 671–689

Google Scholar  

Dietz T, Grabs J, Chong AE (2019) Mainstreamed voluntary sustainability standards and their effectiveness: evidence from the Honduran coffee sector. Regul Gov.

Dixon-Woods M (2011) Using framework-based synthesis for conducting reviews of qualitative studies. BMC Med 9:9–10.

do Canto NR, Bossle MB, Marques L, Dutra M, (2020) Supply chain collaboration for sustainability: a qualitative investigation of food supply chains in Brazil. Manag Environ Qual an Int J.

dos Santos RR, Guarnieri P (2020) Social gains for artisanal agroindustrial producers induced by cooperation and collaboration in agri-food supply chain. Soc Responsib J.

Doukidis GI, Matopoulos A, Vlachopoulou M, Manthou V, Manos B (2007) A conceptual framework for supply chain collaboration: empirical evidence from the agri‐food industry. Supply Chain Manag an Int Journal 12:177–186.

Durach CF, Kembro J, Wieland A (2017) A new paradigm for systematic literature reviews in supply chain management. J Supply Chain Manag 53:67–85.

Durán-Sánchez A, Álvarez-García J, Río-Rama D, De la Cruz M (2018) Sustainable water resources management: a bibliometric overview. Water 10:1–19.

Duru M, Therond O (2015) Livestock system sustainability and resilience in intensive production zones: which form of ecological modernization? Reg Environ Chang 15:1651–1665.

Edison Fondazione (2019) Le eccellenze agricole italiane. I primati europei e mondiali dell’Italia nei prodotti vegetali. Milan (IT)

Ehrenfeld JR (2005) The roots of sustainability. MIT Sloan Manag Rev 46(2)46:23–25

Elia V, Gnoni MG, Tornese F (2017) Measuring circular economy strategies through index methods: a critical analysis. J Clean Prod 142:2741–2751.

Elkington J (1997) Cannibals with forks : the triple bottom line of 21st century business. Capstone, Oxford

Esposito B, Sessa MR, Sica D, Malandrino O (2020) Towards circular economy in the agri-food sector. A systematic literature review. Sustain 12.

European Commission (2018) Agri-food trade in 2018

European Commission (2019) Monitoring EU agri-food trade: development until September 2019

Eurostat (2018) Small and large farms in the EU - statistics from the farm structure survey

FAO (2011) Biodiversity for food and agriculture. Italy, Rome

FAO (2012) Energy-smart food at FAO: an overview. Italy, Rome

FAO (2014) Food wastage footprint: fool cost-accounting

FAO (2016) The state of food and agriculture climate change, agriculture and food security. Italy, Rome

FAO (2017) The future of food and agriculture: trends and challenges. Italy, Rome

FAO (2020) The state of food security and nutrition in the world. Transforming Food Systems for Affordable Healthy Diets. Rome, Italy

Fassio F, Tecco N (2019) Circular economy for food: a systemic interpretation of 40 case histories in the food system in their relationships with SDGs. Systems 7:43.

Fathollahi A, Coupe SJ (2021) Life cycle assessment (LCA) and life cycle costing (LCC) of road drainage systems for sustainability evaluation: quantifying the contribution of different life cycle phases. Sci Total Environ 776:145937.

Ferreira VJ, Arnal ÁJ, Royo P et al (2019) Energy and resource efficiency of electroporation-assisted extraction as an emerging technology towards a sustainable bio-economy in the agri-food sector. J Clean Prod 233:1123–1132.

Fiksel J (2006) A framework for sustainable remediation. JOM 8:15–22.

Flick U (2014) An introduction to qualitative research

Franciosi C, Voisin A, Miranda S et al (2020) Measuring maintenance impacts on sustainability of manufacturing industries : from a systematic literature review to a framework proposal. J Clean Prod 260:1–19.

Gaitán-Cremaschi D, Meuwissen MPM, Oude AGJML (2017) Total factor productivity: a framework for measuring agri-food supply chain performance towards sustainability. Appl Econ Perspect Policy 39:259–285.

Galdeano-Gómez E, Zepeda-Zepeda JA, Piedra-Muñoz L, Vega-López LL (2017) Family farm’s features influencing socio-economic sustainability: an analysis of the agri-food sector in southeast Spain. New Medit 16:50–61

Gallopín G, Herrero LMJ, Rocuts A (2014) Conceptual frameworks and visual interpretations of sustainability. Int J Sustain Dev 17:298–326.

Gallopín GC (2003) Sostenibilidad y desarrollo sostenible: un enfoque sistémico. Cepal, LATIN AMERICA

Garnett T (2013) Food sustainability: problems, perspectives and solutions. Proc Nutr Soc 72:29–39.

Garofalo P, D’Andrea L, Tomaiuolo M et al (2017) Environmental sustainability of agri-food supply chains in Italy: the case of the whole-peeled tomato production under life cycle assessment methodology. J Food Eng 200:1–12.

Gava O, Bartolini F, Venturi F et al (2018) A reflection of the use of the life cycle assessment tool for agri-food sustainability. Sustain 11.

Gazzola P, Querci E (2017) The connection between the quality of life and sustainable ecological development. Eur Sci J 7881:1857–7431

Geissdoerfer M, Savaget P, Bocken N, Hultink EJ (2017) The circular economy – a new sustainability paradigm ? The circular economy – a new sustainability paradigm ? J Clean Prod 143:757–768.

Georgescu-Roegen N (1971) The entropy low and the economic process. Harward University Press, Cambridge Mass

Book   Google Scholar  

Gerbens-Leenes PW, Moll HC, Schoot Uiterkamp AJM (2003) Design and development of a measuring method for environmental sustainability in food production systems. Ecol Econ 46:231–248.

Gésan-Guiziou G, Alaphilippe A, Aubin J et al (2020) Diversity and potentiality of multi-criteria decision analysis methods for agri-food research. Agron Sustain Dev 40.

Ghisellini P, Cialani C, Ulgiati S (2016) A review on circular economy: the expected transition to a balanced interplay of environmental and economic systems. J Clean Prod 114:11–32.

Godoy-Durán Á, Galdeano- Gómez E, Pérez-Mesa JC, Piedra-Muñoz L (2017) Assessing eco-efficiency and the determinants of horticultural family-farming in southeast Spain. J Environ Manage 204:594–604.

Gold S, Kunz N, Reiner G (2017) Sustainable global agrifood supply chains: exploring the barriers. J Ind Ecol 21:249–260.

Goucher L, Bruce R, Cameron DD et al (2017) The environmental impact of fertilizer embodied in a wheat-to-bread supply chain. Nat Plants 3:1–5.

Green A, Nemecek T, Chaudhary A, Mathys A (2020) Assessing nutritional, health, and environmental sustainability dimensions of agri-food production. Glob Food Sec 26:100406.

Guinée JB, Heijungs R, Huppes G et al (2011) Life cycle assessment: past, present, and future †. Environ Sci Technol 45:90–96.

Guiomar N, Godinho S, Pinto-Correia T et al (2018) Typology and distribution of small farms in Europe: towards a better picture. Land Use Policy 75:784–798.

Gunasekaran A, Patel C, McGaughey RE (2004) A framework for supply chain performance measurement. Int J Prod Econ 87:333–347.

Gunasekaran A, Patel C, Tirtiroglu E (2001) Performance measures and metrics in a supply chain environment. Int J Oper Prod Manag 21:71–87.

Hamam M, Chinnici G, Di Vita G et al (2021) Circular economy models in agro-food systems: a review. Sustain 13

Harun SN, Hanafiah MM, Aziz NIHA (2021) An LCA-based environmental performance of rice production for developing a sustainable agri-food system in Malaysia. Environ Manage 67:146–161.

Harvey M, Pilgrim S (2011) The new competition for land: food, energy, and climate change. Food Policy 36:S40–S51.

Hawkes C, Ruel MT (2006) Understanding the links between agriculture and health. DC: International Food Policy Research Institute. Washington, USA

Hellweg S, Milà i Canals L (2014) Emerging approaches, challenges and opportunities in life cycle assessment. Science (80)344:1109LP–1113.

Higgins V, Dibden J, Cocklin C (2015) Private agri-food governance and greenhouse gas abatement: constructing a corporate carbon economy. Geoforum 66:75–84.

Hill T (1995) Manufacturing strategy: text and cases., Macmillan

Hjeresen DD, Gonzales R (2020) Green chemistry promote sustainable agriculture?The rewards are higher yields and less environmental contamination. Environemental Sci Techonology 103–107

Horne R, Grant T, Verghese K (2009) Life cycle assessment: principles, practice, and prospects. Csiro Publishing, Collingwood, Australia

Horton P, Koh L, Guang VS (2016) An integrated theoretical framework to enhance resource efficiency, sustainability and human health in agri-food systems. J Clean Prod 120:164–169.

Hospido A, Davis J, Berlin J, Sonesson U (2010) A review of methodological issues affecting LCA of novel food products. Int J Life Cycle Assess 15:44–52.

Huffman T, Liu J, Green M et al (2015) Improving and evaluating the soil cover indicator for agricultural land in Canada. Ecol Indic 48:272–281.

Ilbery B, Maye D (2005) Food supply chains and sustainability: evidence from specialist food producers in the Scottish/English borders. Land Use Policy 22:331–344.

Ingrao C, Faccilongo N, Valenti F et al (2019) Tomato puree in the Mediterranean region: an environmental life cycle assessment, based upon data surveyed at the supply chain level. J Clean Prod 233:292–313.

Iocola I, Angevin F, Bockstaller C et al (2020) An actor-oriented multi-criteria assessment framework to support a transition towards sustainable agricultural systems based on crop diversification. Sustain 12.

Irabien A, Darton RC (2016) Energy–water–food nexus in the Spanish greenhouse tomato production. Clean Technol Environ Policy 18:1307–1316.

ISO 14040:2006 (2006) Environmental management — life cycle assessment — principles and framework

ISO 14044:2006 (2006) Environmental management — life cycle assessment — requirements and guidelines

ISO 15392:2008 (2008) Sustainability in building construction–general principles

Istat (2019) Andamento dell’economia agricola

Jaakkola E (2020) Designing conceptual articles : four approaches. AMS Rev 1–9.

Jin R, Yuan H, Chen Q (2019) Science mapping approach to assisting the review of construction and demolition waste management research published between 2009 and 2018. Resour Conserv Recycl 140:175–188.

Johnston P, Everard M, Santillo D, Robèrt KH (2007) Reclaiming the definition of sustainability. Environ Sci Pollut Res Int 14:60–66.

Jorgensen SE, Burkhard B, Müller F (2013) Twenty volumes of ecological indicators-an accounting short review. Ecol Indic 28:4–9.

Joshi S, Sharma M, Kler R (2020) Modeling circular economy dimensions in agri-tourism clusters: sustainable performance and future research directions. Int J Math Eng Manag Sci 5:1046–1061.

Kamilaris A, Gao F, Prenafeta-Boldu FX, Ali MI (2017) Agri-IoT: a semantic framework for Internet of Things-enabled smart farming applications. In: 2016 IEEE 3rd World Forum on Internet of Things, WF-IoT 2016. pp 442–447

Karuppusami G, Gandhinathan R (2006) Pareto analysis of critical success factors of total quality management: a literature review and analysis. TQM Mag 18:372–385.

Kates RW, Parris TM, Leiserowitz AA (2005) What is sustainable development? Goals, indicators, values, and practice. Environ Sci Policy Sustain Dev 47:8–21.

Khounani Z, Hosseinzadeh-Bandbafha H, Moustakas K et al (2021) Environmental life cycle assessment of different biorefinery platforms valorizing olive wastes to biofuel, phosphate salts, natural antioxidant, and an oxygenated fuel additive (triacetin). J Clean Prod 278:123916.

Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering version 2.3. Engineering 45.

Korhonen J, Nuur C, Feldmann A, Birkie SE (2018) Circular economy as an essentially contested concept. J Clean Prod 175:544–552.

Kuisma M, Kahiluoto H (2017) Biotic resource loss beyond food waste: agriculture leaks worst. Resour Conserv Recycl 124:129–140.

Laso J, Hoehn D, Margallo M et al (2018) Assessing energy and environmental efficiency of the Spanish agri-food system using the LCA/DEA methodology. Energies 11.

Lee KM (2007) So What is the “triple bottom line”? Int J Divers Organ Communities Nations Annu Rev 6:67–72.

Lehmann RJ, Hermansen JE, Fritz M et al (2011) Information services for European pork chains - closing gaps in information infrastructures. Comput Electron Agric 79:125–136.

León-Bravo V, Caniato F, Caridi M, Johnsen T (2017) Collaboration for sustainability in the food supply chain: a multi-stage study in Italy. Sustainability 9:1253

Lepage A (2009) The quality of life as attribute of sustainability. TQM J 21:105–115.

Li CZ, Zhao Y, Xiao B et al (2020) Research trend of the application of information technologies in construction and demolition waste management. J Clean Prod 263.

Lo Giudice A, Mbohwa C, Clasadonte MT, Ingrao C (2014) Life cycle assessment interpretation and improvement of the Sicilian artichokes production. Int J Environ Res 8:305–316.

Lueddeckens S, Saling P, Guenther E (2020) Temporal issues in life cycle assessment—a systematic review. Int J Life Cycle Assess 25:1385–1401.

Luo J, Ji C, Qiu C, Jia F (2018) Agri-food supply chain management: bibliometric and content analyses. Sustain 10.

Lynch J, Donnellan T, Finn JA et al (2019) Potential development of Irish agricultural sustainability indicators for current and future policy evaluation needs. J Environ Manage 230:434–445.

MacArthur E (2013) Towards the circular economy. J Ind Ecol 2:23–44

MacArthur E (2017) Delivering the circular economy a toolkit for policymakers, The Ellen MacArthur Foundation

MacInnis DJ (2011) A framework for conceptual. J Mark 75:136–154.

Mangla SK, Luthra S, Rich N et al (2018) Enablers to implement sustainable initiatives in agri-food supply chains. Int J Prod Econ 203:379–393.

Marotta G, Nazzaro C, Stanco M (2017) How the social responsibility creates value: models of innovation in Italian pasta industry. Int J Glob Small Bus 9:144–167.

Martucci O, Arcese G, Montauti C, Acampora A (2019) Social aspects in the wine sector: comparison between social life cycle assessment and VIVA sustainable wine project indicators. Resources 8.

Mayring P (2004) Forum : Qualitative social research Sozialforschung 2. History of content analysis. A Companion to Qual Res 1:159–176

McKelvey B (2002) Managing coevolutionary dynamics. In: 18th EGOS Conference. Barcelona, Spain, pp 1–21

McMichael AJ, Butler CD, Folke C (2003) New visions for addressing sustainability. Science (80- ) 302:1191–1920

Mehmood A, Ahmed S, Viza E et al (2021) Drivers and barriers towards circular economy in agri-food supply chain: a review. Bus Strateg Dev 1–17.

Mella P, Gazzola P (2011) Sustainability and quality of life: the development model. In: Kapounek S (ed) Enterprise and competitive environment. Mendel University: Brno, Czechia. 542–551

Merli R, Preziosi M, Acampora A (2018) How do scholars approach the circular economy ? A systematic literature review. J Clean Prod 178:703–722.

Merli R, Preziosi M, Acampora A et al (2020) Recycled fibers in reinforced concrete: a systematic literature review. J Clean Prod 248:119207.

Miglietta PP, Morrone D (2018) Managing water sustainability: virtual water flows and economic water productivity assessment of the wine trade between Italy and the Balkans. Sustain 10.

Mitchell MGE, Chan KMA, Newlands NK, Ramankutty N (2020) Spatial correlations don’t predict changes in agricultural ecosystem services: a Canada-wide case study. Front Sustain Food Syst 4:1–17.

Moraga G, Huysveld S, Mathieux F et al (2019) Circular economy indicators: what do they measure?. Resour Conserv Recycl 146:452–461.

Morrissey JE, Dunphy NP (2015) Towards sustainable agri-food systems: the role of integrated sustainability and value assessment across the supply-chain. Int J Soc Ecol Sustain Dev 6:41–58.

Moser G (2009) Quality of life and sustainability: toward person-environment congruity. J Environ Psychol 29:351–357.

Muijs D (2010) Doing quantitative research in education with SPSS. London

Muller MF, Esmanioto F, Huber N, Loures ER (2019) A systematic literature review of interoperability in the green Building Information Modeling lifecycle. J Clean Prod 223:397–412.

Muradin M, Joachimiak-Lechman K, Foltynowicz Z (2018) Evaluation of eco-efficiency of two alternative agricultural biogas plants. Appl Sci 8.

Naseer MA, ur R, Ashfaq M, Hassan S, et al (2019) Critical issues at the upstream level in sustainable supply chain management of agri-food industries: evidence from Pakistan’s citrus industry. Sustain 11:1–19.

Nattassha R, Handayati Y, Simatupang TM, Siallagan M (2020) Understanding circular economy implementation in the agri-food supply chain: the case of an Indonesian organic fertiliser producer. Agric Food Secur 9:1–16.

Nazari-Sharabian M, Ahmad S, Karakouzian M (2018) Climate change and eutrophication: a short review. Eng Technol Appl Sci Res 8:3668–3672.

Nazir N (2017) Understanding life cycle thinking and its practical application to agri-food system. Int J Adv Sci Eng Inf Technol 7:1861–1870.

Negra C, Remans R, Attwood S et al (2020) Sustainable agri-food investments require multi-sector co-development of decision tools. Ecol Indic 110:105851.

Newsham KK, Robinson SA (2009) Responses of plants in polar regions to UVB exposure: a meta-analysis. Glob Chang Biol 15:2574–2589.

Niemeijer D, de Groot RS (2008) A conceptual framework for selecting environmental indicator sets. Ecol Indic 8:14–25.

Niero M, Kalbar PP (2019) Coupling material circularity indicators and life cycle based indicators: a proposal to advance the assessment of circular economy strategies at the product level. Resour Conserv Recycl 140:305–312.

Nikolaou IE, Tsagarakis KP (2021) An introduction to circular economy and sustainability: some existing lessons and future directions. Sustain Prod Consum 28:600–609.

Notarnicola B, Hayashi K, Curran MA, Huisingh D (2012) Progress in working towards a more sustainable agri-food industry. J Clean Prod 28:1–8.

Notarnicola B, Tassielli G, Renzulli PA, Monforti F (2017) Energy flows and greenhouses gases of EU (European Union) national breads using an LCA (life cycle assessment) approach. J Clean Prod 140:455–469.

Opferkuch K, Caeiro S, Salomone R, Ramos TB (2021) Circular economy in corporate sustainability reporting: a review of organisational approaches. Bus Strateg Environ 1–22.

Padilla-Rivera A, do Carmo BBT, Arcese G, Merveille N, (2021) Social circular economy indicators: selection through fuzzy delphi method. Sustain Prod Consum 26:101–110.

Pagotto M, Halog A (2016) Towards a circular economy in Australian agri-food industry: an application of input-output oriented approaches for analyzing resource efficiency and competitiveness potential. J Ind Ecol 20:1176–1186.

Parent G, Lavallée S (2011) LCA potentials and limits within a sustainable agri-food statutory framework. Global food insecurity. Springer, Netherlands, Dordrecht, pp 161–171

Chapter   Google Scholar  

Pattey E, Qiu G (2012) Trends in primary particulate matter emissions from Canadian agriculture. J Air Waste Manag Assoc 62:737–747.

Pauliuk S (2018) Critical appraisal of the circular economy standard BS 8001:2017 and a dashboard of quantitative system indicators for its implementation in organizations. Resour Conserv Recycl 129:81–92.

Peano C, Migliorini P, Sottile F (2014) A methodology for the sustainability assessment of agri-food systems: an application to the slow food presidia project. Ecol Soc 19.

Peano C, Tecco N, Dansero E et al (2015) Evaluating the sustainability in complex agri-food systems: the SAEMETH framework. Sustain 7:6721–6741.

Pearce DW, Turner RK (1990) Economics of natural resources and the environment. Harvester Wheatsheaf, Hemel Hempstead, Herts

Pelletier N (2018) Social sustainability assessment of Canadian egg production facilities: methods, analysis, and recommendations. Sustain 10:1–17.

Peña C, Civit B, Gallego-Schmid A et al (2021) Using life cycle assessment to achieve a circular economy. Int J Life Cycle Assess 26:215–220.

Perez Neira D (2016) Energy sustainability of Ecuadorian cacao export and its contribution to climate change. A case study through product life cycle assessment. J Clean Prod 112:2560–2568.

Pérez-Neira D, Grollmus-Venegas A (2018) Life-cycle energy assessment and carbon footprint of peri-urban horticulture. A comparative case study of local food systems in Spain. Landsc Urban Plan 172:60–68.

Pérez-Pons ME, Plaza-Hernández M, Alonso RS et al (2021) Increasing profitability and monitoring environmental performance: a case study in the agri-food industry through an edge-iot platform. Sustain 13:1–16.

Petti L, Serreli M, Di Cesare S (2018) Systematic literature review in social life cycle assessment. Int J Life Cycle Assess 23:422–431.

Pieroni MPP, McAloone TC, Pigosso DCA (2019) Business model innovation for circular economy and sustainability: a review of approaches. J Clean Prod 215:198–216.

Polit DF, Beck CT (2004) Nursing research: principles and methods. Lippincott Williams & Wilkins, Philadelphia, PA

Porkka M, Gerten D, Schaphoff S et al (2016) Causes and trends of water scarcity in food production. Environ Res Lett 11:015001.

Prajapati H, Kant R, Shankar R (2019) Bequeath life to death: state-of-art review on reverse logistics. J Clean Prod 211:503–520.

Priyadarshini P, Abhilash PC (2020) Policy recommendations for enabling transition towards sustainable agriculture in India. Land Use Policy 96:104718.

Pronti A, Coccia M (2020) Multicriteria analysis of the sustainability performance between agroecological and conventional coffee farms in the East Region of Minas Gerais (Brazil). Renew Agric Food Syst.

Rabadán A, González-Moreno A, Sáez-Martínez FJ (2019) Improving firms’ performance and sustainability: the case of eco-innovation in the agri-food industry. Sustain 11.

Raut RD, Luthra S, Narkhede BE et al (2019) Examining the performance oriented indicators for implementing green management practices in the Indian agro sector. J Clean Prod 215:926–943.

Recanati F, Marveggio D, Dotelli G (2018) From beans to bar: a life cycle assessment towards sustainable chocolate supply chain. Sci Total Environ 613–614:1013–1023.

Redclift M (2005) Sustainable development (1987–2005): an oxymoron comes of age. Sustain Dev 13:212–227.

Rezaei M, Soheilifard F, Keshvari A (2021) Impact of agrochemical emission models on the environmental assessment of paddy rice production using life cycle assessment approach. Energy Sources. Part A Recover Util Environ Eff 1–16

Rigamonti L, Mancini E (2021) Life cycle assessment and circularity indicators. Int J Life Cycle Assess.

Risku-Norja H, Mäenpää I (2007) MFA model to assess economic and environmental consequences of food production and consumption. Ecol Econ 60:700–711.

Ritzén S, Sandström GÖ (2017) Barriers to the circular economy – integration of perspectives and domains. Procedia CIRP 64:7–12.

Rockström J, Steffen W, Noone K et al (2009) A safe operating space for humanity. Nature 461:472–475.

Roos Lindgreen E, Mondello G, Salomone R et al (2021) Exploring the effectiveness of grey literature indicators and life cycle assessment in assessing circular economy at the micro level: a comparative analysis. Int J Life Cycle Assess.

Roselli L, Casieri A, De Gennaro BC et al (2020) Environmental and economic sustainability of table grape production in Italy. Sustain 12.

Ross RB, Pandey V, Ross KL (2015) Sustainability and strategy in U.S. agri-food firms: an assessment of current practices. Int Food Agribus Manag Rev 18:17–48

Royo P, Ferreira VJ, López-Sabirón AM, Ferreira G. (2016) Hybrid diagnosis to characterise the energy and environmental enhancement of photovoltaic modules using smart materials. Energy 101:174–189.

Ruggerio CA (2021) Sustainability and sustainable development: a review of principles and definitions. Sci Total Environ 786:147481.

Ruiz-Almeida A, Rivera-Ferre MG (2019) Internationally-based indicators to measure agri-food systems sustainability using food sovereignty as a conceptual framework. Food Secur 11:1321–1337.

Ryan M, Hennessy T, Buckley C et al (2016) Developing farm-level sustainability indicators for Ireland using the Teagasc National Farm Survey. Irish J Agric Food Res 55:112–125.

Saade MRM, Yahia A, Amor B (2020) How has LCA been applied to 3D printing ? A systematic literature review and recommendations for future studies. J Clean Prod 244:118803.

Saitone TL, Sexton RJ (2017) Agri-food supply chain: evolution and performance with conflicting consumer and societal demands. Eur Rev Agric Econ 44:634–657.

Salim N, Ab Rahman MN, Abd Wahab D (2019) A systematic literature review of internal capabilities for enhancing eco-innovation performance of manufacturing firms. J Clean Prod 209:1445–1460.

Salimi N (2021) Circular economy in agri-food systems BT - strategic decision making for sustainable management of industrial networks. In: International S (ed) Rezaei J. Publishing, Cham, pp 57–70

Salomone R, Ioppolo G (2012) Environmental impacts of olive oil production: a life cycle assessment case study in the province of Messina (Sicily). J Clean Prod 28:88–100.

Sánchez AD, Río DMDLC, García JÁ (2017) Bibliometric analysis of publications on wine tourism in the databases Scopus and WoS. Eur Res Manag Bus Econ 23:8–15.

Saputri VHL, Sutopo W, Hisjam M, Ma’aram A (2019) Sustainable agri-food supply chain performance measurement model for GMO and non-GMO using data envelopment analysis method. Appl Sci 9.

Sassanelli C, Rosa P, Rocca R, Terzi S (2019) Circular economy performance assessment methods : a systematic literature review. J Clean Prod 229:440–453.

Schiefer S, Gonzalez C, Flanigan S (2015) More than just a factor in transition processes? The role of collaboration in agriculture. In: Sutherland LA, Darnhofer I, Wilson GA, Zagata L (eds) Transition pathways towards sustainability in agriculture: case studies from Europe, CPI Group. Croydon, UK, pp. 83

Seuring S, Muller M (2008) From a literature review to a conceptual framework for sustainable supply chain management. J Clean Prod 16:1699–1710.

Silvestri C, Silvestri L, Forcina A, et al (2021) Green chemistry contribution towards more equitable global sustainability and greater circular economy: A systematic literature review. J Clean Prod 294.

Smetana S, Schmitt E, Mathys A (2019) Sustainable use of Hermetia illucens insect biomass for feed and food: attributional and consequential life cycle assessment. Resour Conserv Recycl 144:285–296.

Sonesson U, Berlin J, Ziegler F (2010) Environmental assessment and management in the food industry: life cycle assessment and related approaches. Woodhead Publishing, Cambridge

Soussana JF (2014) Research priorities for sustainable agri-food systems and life cycle assessment. J Clean Prod 73:19–23.

Soylu A, Oruç C, Turkay M et al (2006) Synergy analysis of collaborative supply chain management in energy systems using multi-period MILP. Eur J Oper Res 174:387–403.

Spaiser V, Ranganathan S, Swain RB, Sumpter DJ (2017) The sustainable development oxymoron: quantifying and modelling the incompatibility of sustainable development goals. Int J Sustain Dev World Ecol 24:457–470.

Stewart R, Niero M (2018) Circular economy in corporate sustainability strategies: a review of corporate sustainability reports in the fast-moving consumer goods sector. Bus Strateg Environ 27:1005–1022.

Stillitano T, Spada E, Iofrida N et al (2021) Sustainable agri-food processes and circular economy pathways in a life cycle perspective: state of the art of applicative research. Sustain 13:1–29.

Stone J, Rahimifard S (2018) Resilience in agri-food supply chains: a critical analysis of the literature and synthesis of a novel framework. Supply Chain Manag 23:207–238.

Strazza C, Del Borghi A, Gallo M, Del Borghi M (2011) Resource productivity enhancement as means for promoting cleaner production: analysis of co-incineration in cement plants through a life cycle approach. J Clean Prod 19:1615–1621.

Su B, Heshmati A, Geng Y, Yu X (2013) A review of the circular economy in China: moving from rhetoric to implementation. J Clean Prod 42:215–227.

Suárez-Eiroa B, Fernández E, Méndez-Martínez G, Soto-Oñate D (2019) Operational principles of circular economy for sustainable development: linking theory and practice. J Clean Prod 214:952–961.

Svensson G, Wagner B (2015) Implementing and managing economic, social and environmental efforts of business sustainability. Manag Environ Qual an Int Journal 26:195–213.

Tasca AL, Nessi S, Rigamonti L (2017) Environmental sustainability of agri-food supply chains: an LCA comparison between two alternative forms of production and distribution of endive in northern Italy. J Clean Prod 140:725–741.

Tassielli G, Notarnicola B, Renzulli PA, Arcese G (2018) Environmental life cycle assessment of fresh and processed sweet cherries in southern Italy. J Clean Prod 171:184–197.

Teixeira R, Pax S (2011) A survey of life cycle assessment practitioners with a focus on the agri-food sector. J Ind Ecol 15:817–820.

Tobergte DR, Curtis S (2013) ILCD Handbook. J Chem Info Model.

Tortorella MM, Di Leo S, Cosmi C et al (2020) A methodological integrated approach to analyse climate change effects in agri-food sector: the TIMES water-energy-food module. Int J Environ Res Public Health 17:1–21.

Tranfield D, Denyer D, Smart P (2003) Towards a methodology for developing evidenceinformed management knowledge by means of systematic review. Br J Manag 14:207–222

Trivellas P, Malindretos G, Reklitis P (2020) Implications of green logistics management on sustainable business and supply chain performance: evidence from a survey in the greek agri-food sector. Sustain 12:1–29.

Tsangas M, Gavriel I, Doula M et al (2020) Life cycle analysis in the framework of agricultural strategic development planning in the Balkan region. Sustain 12:1–15.

Ülgen VS, Björklund M, Simm N (2019) Inter-organizational supply chain interaction for sustainability : a systematic literature review.

UNEP S (2020) Guidelines for social life cycle assessment of products and organizations 2020.

UNEP/SETAC (2009) United Nations Environment Programme-society of Environmental Toxicology and Chemistry. Guidelines for social life cycle assessment of products. France

United Nations (2011) Guiding principles on business and human rights. Implementing the United Nations “protect, respect and remedy” framework

United Nations (2015) Transforming our world: the 2030 agenda for sustainable development.

Van Asselt ED, Van Bussel LGJ, Van Der Voet H et al (2014) A protocol for evaluating the sustainability of agri-food production systems - a case study on potato production in peri-urban agriculture in the Netherlands. Ecol Indic 43:315–321.

Van der Ploeg JD (2014) Peasant-driven agricultural growth and food sovereignty. J Peasant Stud 41:999–1030.

van Eck NJ, Waltman L (2010) Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 84:523–538.

Van Eck NJ, Waltman L (2019) Manual for VOSviwer version 1.6.10. CWTS Meaningful metrics 1–53

Vasa L, Angeloska A, Trendov NM (2017) Comparative analysis of circular agriculture development in selected Western Balkan countries based on sustainable performance indicators. Econ Ann 168:44–47.

Verdecho MJ, Alarcón-Valero F, Pérez-Perales D et al (2020) A methodology to select suppliers to increase sustainability within supply chains. Cent Eur J Oper Res.

Vergine P, Salerno C, Libutti A et al (2017) Closing the water cycle in the agro-industrial sector by reusing treated wastewater for irrigation. J Clean Prod 164:587–596.

WCED (1987) Our common future - call for action

Webster K (2013) What might we say about a circular economy? Some temptations to avoid if possible. World Futures 69:542–554

Wheaton E, Kulshreshtha S (2013) Agriculture and climate change: implications for environmental sustainability indicators. WIT Trans Ecol Environ 175:99–110.

Wijewickrama MKCS, Chileshe N, Rameezdeen R, Ochoa JJ (2021) Information sharing in reverse logistics supply chain of demolition waste: a systematic literature review. J Clean Prod 280:124359.

Woodhouse A, Davis J, Pénicaud C, Östergren K (2018) Sustainability checklist in support of the design of food processing. Sustain Prod Consum 16:110–120.

Wu R, Yang D, Chen J (2014) Social Life Cycle Assessment Revisited Sustain 6:4200–4226.

Yadav S, Luthra S, Garg D (2021) Modelling Internet of things (IoT)-driven global sustainability in multi-tier agri-food supply chain under natural epidemic outbreaks. Environ Sci Pollut Res 16633–16654.

Yee FM, Shaharudin MR, Ma G et al (2021) Green purchasing capabilities and practices towards Firm’s triple bottom line in Malaysia. J Clean Prod 307:127268.

Yigitcanlar T (2010) Rethinking sustainable development: urban management, engineering, and design. IGI Global

Zamagni A, Amerighi O, Buttol P (2011) Strengths or bias in social LCA? Int J Life Cycle Assess 16:596–598.

Download references

Author information

Authors and affiliations.

Department of Economy, Engineering, Society and Business Organization, University of “Tuscia, ” Via del Paradiso 47, 01100, Viterbo, VT, Italy

Cecilia Silvestri, Michela Piccarozzi & Alessandro Ruggieri

Department of Engineering, University of Rome “Niccolò Cusano, ” Via Don Carlo Gnocchi, 3, 00166, Rome, Italy

Luca Silvestri

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Cecilia Silvestri .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Communicated by Monia Niero

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: a number of ill-placed paragraph headings were removed and the source indication "Authors' elaborations" was added to Tables 1-3.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 31 KB)

Rights and permissions.

Reprints and permissions

About this article

Silvestri, C., Silvestri, L., Piccarozzi, M. et al. Toward a framework for selecting indicators of measuring sustainability and circular economy in the agri-food sector: a systematic literature review. Int J Life Cycle Assess (2022).

Download citation

Received : 15 June 2021

Accepted : 16 February 2022

Published : 02 March 2022


Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Agri-food sector
  • Sustainability
  • Circular economy
  • Triple bottom line
  • Life cycle assessment
  • Find a journal
  • Publish with us
  • Track your research
  • Systematic Review
  • Open access
  • Published: 12 September 2023

Treatment options for digital nerve injury: a systematic review and meta-analysis

  • Yi Zhang 1 , 2 ,
  • Nianzong Hou 2 , 4 ,
  • Jian Zhang 1 ,
  • Bing Xie 2 ,
  • Jiahui Liang 1 ,
  • Xiaohu Chang 1 ,
  • Kai Wang 3 &
  • Xin Tang 1  

Journal of Orthopaedic Surgery and Research volume  18 , Article number:  675 ( 2023 ) Cite this article

1627 Accesses

1 Citations

Metrics details

Surgical treatment of finger nerve injury is common for hand trauma. However, there are various surgical options with different functional outcomes. The aims of this study are to compare the outcomes of various finger nerve surgeries and to identify factors associated with the postsurgical outcomes via a systematic review and meta-analysis.

The literature related to digital nerve repairs were retrieved comprehensively by searching the online databases of PubMed from January 1, 1965, to August 31, 2021. Data extraction, assessment of bias risk and the quality evaluation were then performed. Meta-analysis was performed using the postoperative static 2-point discrimination (S2PD) value, moving 2-point discrimination (M2PD) value, and Semmes–Weinstein monofilament testing (SWMF) good rate, modified Highet classification of nerve recovery good rate. Statistical analysis was performed using the R (V.3.6.3) software. The random effects model was used for the analysis. A systematic review was also performed on the other influencing factors especially the type of injury and postoperative complications of digital nerve repair.

Sixty-six studies with 2446 cases were included in this study. The polyglycolic acid conduit group has the best S2PD value (6.71 mm), while the neurorrhaphy group has the best M2PD value (4.91 mm). End-to-side coaptation has the highest modified Highet’s scoring (98%), and autologous nerve graft has the highest SWMF (91%). Age, the size of the gap, and the type of injury were factors that may affect recovery. The type of injury has an impact on the postoperative outcome of neurorrhaphy. Complications reported in the studies were mainly neuroma, cold sensitivity, paresthesia, postoperative infection, and pain.

Our study demonstrated that the results of surgical treatment of digital nerve injury are generally satisfactory; however, no nerve repair method has absolute advantages. When choosing a surgical approach to repair finger nerve injury, we must comprehensively consider various factors, especially the gap size of the nerve defect, and postoperative complications.

Type of study/level of evidence Therapeutic IV.

Finger nerve laceration is one of the most common injuries in hand trauma, and its incidence rate is high in the peripheral nerve injuries of the upper limbs [ 1 ]. Most hand injuries with nerve damage require surgical treatment [ 2 ]. Potential common complications from either surgical or non-surgical treatments include numbness, paresthesia, neuroma, and cold intolerance [ 3 ].

Finger nerve repair currently has two main surgical approaches. End-to-end tension-free neurorrhaphy has traditionally been the preferred repair method in lesions with a gap smaller than 5 mm [ 2 ]. When the nerve ends cannot be approximated without tension, nerve reconstruction becomes the most commonly used method. [ 4 ] Various materials are available for reconstruction, such as autograft, nerve autograft, nerve allograft, and artificial conduit. End-to-side anastomosis is also commonly used to reconstruct large nerve defects. The repair materials of autograft mainly include veins and muscle-in-vein [ 5 ]. The autologous nerve graft is the historical gold standard for nerve reconstruction [ 2 ]. However, the autologous nerve graft damages the patient’s own tissue, which can increase operative time for harvesting donor nerve and increase potential donor site morbidity [ 6 ]. With the improvement of technology and repair materials, nerve duct repair technology and allogeneic nerve repair technology are now available. These two techniques avoid donor site complications caused by autologous nerve transplantation [ 5 ]. Synthetic nerve conduits have polyglycolic acid (PGA) tubes and collagen tubes. However, potential complications of allogeneic transplantation include the transmission of infectious diseases [ 5 ]. For large-segment defects or proximal nerve damage, some scholars have tried the technique of end-to-side nerve anastomosis. This method can bridge the damaged nerve to the healthy nerve [ 7 ].

In addition to the surgical method that may affect the functional outcomes, other predictors of sensory recovery have been evaluated in several studies, such as mechanism of injury gender, age, involved digit, level of injury, time from injury till repair, and gap length. The main one is the type of injury, which can affect the severity of the nerve damage, the gap between the nerve defects, and the recovery after surgery. According to Kusuhara et al. [ 8 ], avulsion injuries had significantly lower levels of meaningful recovery when compared with those of clean-cut and crush types of injury. However, Schmauss et al.’s study [ 9 ] suggested that it did not observe significant differences in sharp versus crush injuries.

Few systematic reviews and meta-analyses have been conducted to compare surgical approaches and factors associated with sensory outcomes of digital nerve repair. [ 2 , 3 , 5 , 10 , 11 , 12 , 13 ] In 2013 Paprottka et al.’s research, some of the included studies were low quality, and they did not compare allogeneic nerve repairs [ 5 ]. Herman et al. and Mauch et al.’s research in 2019 [ 8 ] included fewer studies and performed limited subgroups analyzed due to small sample size [ 2 , 10 ]. Thus, we aimed to perform a comprehensive meta-analysis and systematic review of finger nerve repair to include high-quality studies with large sample sizes and conduct detailed subgroup analysis to compare different surgical approaches. We also aimed to identify factors associated with the functional outcomes of finger nerve repair.

We performed and reported this review based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.

Search strategy and inclusion/exclusion criteria

We performed systematic literature search in PubMed. The search terms “digital nerve,” “operation,” “surgery,” “nerve injury,” “nerve repair,” were combined using Boolean operators. Both “free-text term” and “MeSH term” searches were completed. We did not impose any restrictions on the language. The publication date was set from January 1, 1965, to August 31, 2021, because the clinical implementation of the surgical microscope started around 1965. The previous surgeries without microscopes were not included in the study [ 14 ]. Additionally, we reviewed the reference lists of the included papers and previously published reviews to ensure relevant studies had been considered. We merged all search results and discarded duplicate citations [ 2 , 3 , 5 , 10 , 11 , 12 , 13 ].

Two authors screened the articles independently based on the titles and abstracts, and each author independently retrieved and examined the full texts of the relevant papers for inclusion/exclusion based on predefined stratified criteria. Finally, we included all prospective and retrospective studies on surgical treatment of finger nerve injuries, including observational cohort studies, randomized controlled trials, and case reports with detailed data. We included patients of all ages with finger nerve injuries. The data published on the included studies were analyzed for the outcomes. We included results with at least 6-month follow-up. Exclusion criteria were peripheral nerve lesions not localized to the digital nerves in the hand, duplicated data, without appropriate data analysis methods, inconsistent data, reviews, unpublished literature, conference papers, studies without adequate information. The PRISMA flowchart is shown in Fig.  1 .

figure 1

Flowchart of studies identified, included, and excluded

Data extraction and outcome measures

The primary author extracted data onto a predefined electronic data extraction form, and then, the other author checked all the data. Any disagreements were resolved through discussion, if necessary, with the involvement of a third reviewer. We extract the following data from each included literature, the characteristics of the literature (author, nationality, research type, hospital, date), population characteristics (age, gender, sample size, number of lost follow-up, number of injured nerves, smoking, type of injury), damage and repair status (nerve gap, repair time, type of surgery, follow-up time), complications (postoperative neuroma, cold stimulation, paresthesia, postoperative infection, pain).

The outcome measurements we used included: static 2-point discrimination (S2PD), moving 2-point dis crimination (M2PD), Semmes–Weinstein monofilament testing (SWMF), and modified Highet classification of nerve recovery [ 3 ]. Weber first described S2PD in 1835 which was the most widely used outcome measure. Normal values of S2PD in an uninjured fingertip range from 2 to 6 mm. M2PD was described by Dellon, and we used it as the second outcome indicator to evaluate the recovery of the finger nerves after surgery. S2PD and M2PD use actual measurement distance to evaluate the degree of nerve recovery. They are both continuous variables. The shorter the measurement distance, the better the response.

We used a modified classification system derived from Imai et al. to group SWMF outcomes. The SWMF scores ≤ 2.83 mean “normal” for sensation, scores from 2.83 to 4.31 mean “diminished light touch,” scores from 4.31 to 4.56 mean “diminished protective sensation,” scores from 4.56 to 6.10 mean “loss of protective sensation,” and scores > 6.10 mean “anesthetic” [ 15 ]. We counted the number of people with a score less than 4.31 (full sensation and diminished light touch) to calculate the excellent rate for the degree of recovery.

Medical Research Council scoring system from 1954, modified by MacKinnon and Dellon often referred to as modified Highet, grouped a range of values into subjective headings [ 3 ]. This scoring system was often used to evaluate the recovery after nerve repair. The specific evaluation criteria are shown in Table 1 . We extracted the sensory recovery as good and excellent nerve numbers in the table to evaluate the effect of the treatment.

In the S2PD and Highet data sets, there were many accounting articles, large amounts of data, and more detailed data. Therefore, we divided artificial catheters into two subgroups: collagen tubes and polyglycolic acid catheters. We divided venous catheters and muscle-in-vein grafts into groups in the autograft method. Direct suture and end-to-side anastomosis were split into two subgroups of neurorrhaphy for analysis. For these two data groups, we divided them into artificial conduit: polyglycolic acid, artificial conduit: collagen, nerve allograft, autograft repair: muscle-in-vein graft, autograft repair: vein graft, autologous nerve graft, end-to-end coaptation, end-to-side coaptation, total 8 repair types.

There were fewer articles in the M2PD and SWMF data sets, so the data we extracted were limited. When summarizing and analyzing the data, we did not conduct a detailed subgroup analysis but merged them into five repair Types for analysis. They were: artificial conduit (collagen tubes/polyglycolic acid catheters), nerve allograft, autograft repair (muscle-in-vein graft/vein graft), autologous nerve graft, and neurorrhaphy (end-to-end coaptation/end-to-side coaptation).

In addition, to evaluate the outcomes of the surgical repair methods, we also summarized and analyzed other factors associated with the result. These factors mainly included age, never gap, injury type, repair time, and smoking. Of course, the most important of these factors is the type of injury, which affects the degree of nerve damage, the choice of the surgical method, and postoperative recovery. We analyzed 25 articles [ 1 , 7 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 ] with specific injury descriptions through further screening of the included literature. We divided the injury types into sharp injury and crush injury. Sharp injuries include cutting injuries, acute or semi-sharp injuries, and stab injuries. Crush injuries include serious crush injuries, mangled injuries, and lacerated injuries. We analyzed patients with two types of injury in four types of surgery, and the analysis indexes were S2PD and modified Highet score excellent rate.

Complications reported in the studies were mainly neuroma, cold sensitivity, paresthesia, postoperative infection, and pain. We also conducted a summary analysis.

Statistical analysis, risk of bias, and study quality assessment

Our meta-analysis was performed by R (V.3.6.3) and package of meta. Heterogeneity variance parameter I 2 test was used to assess the heterogeneity of the model. However, in order to reduce the difference between the parameters and avoid error of the results caused by heterogeneity, the random effects model was used to merge the statistics. For postoperative S2PD and M2PD of various surgical methods, we use a combined statistical analysis of mean and standard deviation. For the SWMF excellent rate and modified Highet score excellent rate, we adopted a combined statistical analysis of the rates. The results of the merger were displayed in a forest diagram, and the statistics were compared in the form of a table. We used funnel chart and egger test for publication bias. In the analysis by surgical method and injury type, the continuous variables of S2PD were compared by T test, and the excellent and good rates were compared using the chi-square test.

We used standardized critical appraisal instruments from the JBI Meta-Analysis of Statistics Assessment and Review Instrument (JBI-MAStARI) (Appendix II) to evaluate all included literature. Because all the included studies were case series or cohort studies, we used JBI Critical Appraisal Checklist for Descriptive/Case Series to evaluate the quality of the literature. This evaluation checklist includes 9 quality items, and the judging options include yes, no, unclear, and not applicable. Studies that blinded the evaluators and had “yes” scores of 80% were considered high quality; those with “yes” scores of 60–80% were rated as medium, and the quality of studies with a score of less than 60% was considered low. Any disagreements that arose between the reviewers were resolved through discussion.

Study selection

We searched the PubMed database using keywords and got 403 different publications. At the same time, we examined the reference lists of the included papers and previous reviews to add 45 records. Sixty-six articles were included in the final data analysis [ 1 , 7 , 8 , 9 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 72 , 73 , 74 , 75 , 76 , 86 ] (Fig.  1 ).

Study characteristics

The 66 articles included a total of 2446 cases. Fifty studies [ 1 , 7 , 16 , 19 , 21 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 41 , 42 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 72 , 73 , 74 , 75 , 76 , 86 ] were retrospective case series, and 16 [ 8 , 9 , 17 , 18 , 20 , 22 , 23 , 24 , 40 , 43 , 44 , 53 , 54 , 55 , 56 , 57 ] were prospective. Of these studies, 16 control studies were available [ 20 , 21 , 28 , 29 , 38 , 40 , 41 , 42 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 ]. There were 3 papers that we only extracted part of the data because they included other nerve injuries in addition to the finger nerves [ 7 , 32 , 61 ]. The age range of patients included in these studies was 1–81 years old. The time from injury to surgical repair ranged between 0 and 37 months, and follow-up time ranged between 6 and 202 months. The detailed characteristics of eligible studies are shown in Table 2 .

Quality assessment and publication bias

All 66 articles were evaluated for the quality assessment using the JBI-MAStARI evaluation tool, and the research evaluation levels were high or medium. The specific evaluation results are shown in Tables 2 , 3 and 4 . The P values derived from Egger’s test indicated their inexistence of the publication bias in most meta-analyses. The results of the Egger test are summarized in Tables 5 , 6 , 7 , 8 and 9 .

Synthesis of results

All the data extracted from the literature are shown in Table 2 . The S2PD, Highet score, M2PD, and SWMF sensory results are summarized in Tables 5 , 6 , 7 and 8 .

A total of 51 articles reported the S2PD data [ 8 , 9 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 27 , 28 , 29 , 30 , 31 , 35 , 36 , 37 , 38 , 39 , 40 , 42 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 76 , 86 ]. After a summary analysis, the polyglycolic acid conduit group was 6.71 mm (95% CI 4.46; 8.96), which was the smallest discrimination distance, the end-to-end coaptation group was 8.80 mm (95% CI 7.63; 9.97), and the postoperative discrimination distance was the largest. The values of the other groups were distributed between them, but they have yet to reach excellent (2–6 mm), just at the good level (7–15 mm) (Table 5 , Figs. 2 , 3 ).

figure 2

Static 2-point discrimination results for each repair technique

figure 3

Forest plot of static 2-point discrimination results for each repair technique. a Forest plot of S2PD—Artificial conduit: polyglycolic acid; b Forest plot of S2PD—Artificial conduit: collagen; c Forest plot of S2PD—nerve allografts; d Forest plot of S2PD—autograft repair: muscle-in-vein graft; e Forest plot of S2PD—autograft repair: vein graft; f Forest plot of S2PD—autologous nerve graft; g Forest plot of S2PD—end-to-end coaptation; and h Forest plot of S2PD—end-to-side coaptation

The excellent rate of modified Highet’s scoring includes 61 articles [ 1 , 7 , 8 , 9 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 41 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 58 , 59 , 60 , 61 , 62 , 64 , 65 , 66 , 67 , 68 , 69 , 71 , 72 , 73 , 74 , 75 , 76 , 86 ]. The end-to-side coaptation group was 98% (95% CI 0.85, 1.00), and the postoperative felt the excellent rate was the highest. The polyglycolic acid conduit group was 74% (95% CI 0.53; 0.91), and the excellent rate was the lowest (Table 6 , Figs. 4 , 5 ).

figure 4

Modified Highet classification good rate for each repair technique

figure 5

Forest plot of modified Highet classification good rate for each repair technique. a Forest plot of modified Highet classification good rate—Artificial conduit: polyglycolic acid; b Forest plot of modified Highet classification good rate—Artificial conduit: collagen; c Forest plot of modified Highet classification good rate—nerve allograft; d Forest plot of modified Highet classification good rate—autograft repair: muscle-in-vein graft; e Forest plot of modified Highet classification good rate—autograft repair: vein graft; f Forest plot of modified Highet classification good rate—autologous nerve graft; g Forest plot of modified Highet classification good rate—end-to-end coaptation; and h Forest plot of modified Highet classification good rate—end-to-side coaptation

The M2PD group included 19 articles [ 17 , 20 , 23 , 24 , 27 , 28 , 36 , 37 , 39 , 40 , 41 , 45 , 47 , 50 , 54 , 57 , 60 , 68 , 69 ]. The neurorrhaphy group was 4.91 mm (95% CI 3.72, 6.09), and the discrimination distance was the smallest; the autograft repair group was 7.06 mm (95% CI 5.58, 8.54), and the postoperative discrimination distance was the largest. The five data sets have yet to reach excellent (2–3 mm) but at a good level (4–7 mm) (Table 7 , Figs. 6 , 7 ).

figure 6

Moving 2-point discrimination results for each repair technique

figure 7

Forest plot of moving 2-point discrimination results for each repair technique. a Forest plot of M2PD—artificial conduit; b Forest plot of M2PD—nerve allograft; c Forest plot of M2PD—autograft repair; d Forest plot of M2PD—autologous nerve graft; and e Forest plot of M2PD—neurorrhaphy

There were 29 documents included in the SWMF data set [ 9 , 16 , 18 , 19 , 20 , 22 , 23 , 25 , 27 , 28 , 29 , 30 , 36 , 45 , 46 , 47 , 49 , 52 , 53 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 73 , 76 , 86 ]. The highest excellent and good rate was 91% (95% CI 0.80, 0.99) in the autologous nerve graft group. The lowest was 61% (95% CI 0.40, 0.80) in the autograft repair group (Table 8 , Figs. 8 , 9 ).

figure 8

Semmes–Weinstein monofilament testing good rate for each repair technique

figure 9

Forest plot of Semmes–Weinstein monofilament testing good rate for each repair technique. a Forest plot of Semmes–Weinstein monofilament testing good rate—artificial conduit; b Forest plot of Semmes–Weinstein monofilament testing good rate—nerve allografts; c Forest plot of Semmes–Weinstein monofilament testing good rate—autograft repair; d Forest plot of Semmes–Weinstein monofilament testing good rate—autologous nerve graft; and e Forest plot of Semmes–Weinstein monofilament testing good rate—neurorrhaphy

Finally, we conducted a summary analysis of all the data in the 4 outcome indicators. S2PD was 8.18 mm (95% CI 7.66, 8.70), M2PD was 5.90 mm (95% CI 5.34, 6.46), Highet score excellent and good rate was 80% (95% CI 0.74, 0.86), and SWMF excellent and good rate was 81% (95% CI 0.72, 0.88) (Table 9 , Figs. 10 , 11 , 12 , 13 ).

figure 10

Forest plot of static 2-point discrimination results

figure 11

Forest plot of moving 2-point discrimination results

figure 12

Forest plot of modified Highet classification good rate

figure 13

Forest plot of Semmes–Weinstein monofilament testing good rate

We extracted data from 25 articles for statistical analysis in subgroups by injury type. In terms of S2PD values, there was no significant difference in sharp and blunt injuries among the four surgical methods ( P  > 0.05). In terms of the excellent and good rate, the recovery effect of sharp injury was better than that of blunt injury only in the surgical method of neurorrhaphy ( P  = 0.00004472), and there was no statistical difference in the other methods (Tables 12 , 13 ).

We performed statistics on the analysis of other influencing factors in the included literature and completed a summary analysis of complications. In the study of influencing factors, in terms of age factor, 13 articles considered it to have an impact [ 1 , 21 , 32 , 33 , 34 , 36 , 55 , 57 , 60 , 67 , 72 , 73 , 74 ], and nine assumed it to have no effect [ 9 , 20 , 43 , 45 , 63 , 65 , 66 , 71 , 75 ]. In terms of nerve injury interval, 11 papers were deemed to be influential [ 9 , 21 , 26 , 40 , 43 , 44 , 51 , 52 , 71 , 72 , 74 ], and five pieces that have no influence [ 20 , 32 , 60 , 65 , 67 ]; four articles were considered to be compelling, [ 8 , 27 , 52 , 60 ], and ten articles were considered to be unaffected by the repair time factor [ 9 , 32 , 35 , 43 , 63 , 65 , 66 , 71 , 73 , 75 ]; in terms of smoking factors, three papers were supposed to be affected [ 33 , 40 , 73 ], and four pieces were not affected [ 9 , 43 , 45 , 63 ] (Table 10 ).

The results of the pooled analysis of complications are shown that there were 12 articles of the literature reporting neuroma [ 21 , 29 , 32 , 38 , 44 , 47 , 56 , 57 , 62 , 63 , 64 , 68 ], and 14 cases can be counted (artificial conduit: 2 articles, 3 cases; autograft repair: 7 articles, 7 cases; and nerve sutures: 3 articles, 4 cases); 13 publications reporting cold stimulation [ 27 , 29 , 30 , 32 , 37 , 38 , 49 , 58 , 63 , 67 , 68 , 69 , 70 ], and 50 cases were counted (autograft repair: 10 articles, 47 cases; nerve sutures: 3 articles, 3 cases); 17 papers reporting paresthesia [ 1 , 9 , 21 , 27 , 29 , 30 , 32 , 33 , 38 , 44 , 49 , 62 , 63 , 65 , 67 , 71 , 76 ], and 15 cases were counted (artificial conduit: 3 articles, 1 case; autograft repair:11 articles,14 cases; and nerve sutures: 3 articles); 6 articles reporting postoperative infections [ 20 , 21 , 40 , 45 , 53 , 69 ], and 10 cases were counted (artificial conduit: 3 articles, 5 cases; nerve allograft: 2 articles, 4 cases; autograft repair: 1 articles, 1 case); 13 articles reported pain [ 20 , 21 , 23 , 29 , 37 , 38 , 39 , 49 , 50 , 53 , 58 , 67 , 70 ], and 23 cases were counted (artificial conduit: 2 articles, 1 cases; nerve allograft: 3 articles, 9 cases; autograft repair: 6 articles, 12 cases; and nerve sutures: 2 articles, 1 cases) (Table 10 ).

We analyzed the maximum extent of neurological defects treated by various surgical methods in the literature. The direct suture is the minimum tension-free suture required to repair the defect within 0.5 cm. The largest defect was repaired by autogenous nerve graft, ranging from 0.5 to 9.0 cm. The end-to-side anastomosis technique had no limitation on the length of the defect and was a method of nerve transplantation or bridging (Table 11 ).

It has been reported that among all peripheral nerve injuries, the digital nerves were the most common peripheral nerves injured [ 77 ]. In the published literature, there were many ways to repair digital nerve injury. However, the clinical practice of digital nerve repair has been lack of consensus. Thus, we analyzed the published literature on finger nerve injury .

Using the S2PD and modified Highet’s scoring systems, tension-free end-to-end coaptation was the most common method for nerve repair. We found that compared with the other nerve defect repair methods, it seemed that there was no obvious advantage. Autologous nerve transplantation also showed no absolute advantage. As a new material to repair nerve defects, allogeneic nerves have been widely used. Compared with the autologous nerves, it has no obvious advantages. However, it can avoid other postoperative complications caused by nerve extraction and has the same effect as autologous nerve in nerve regeneration. There were some differences between PGA tubes and collagen tubes. In 2003, Laroas et al. published their results on 28 PGA-conduit repairs that with sensory re-education, the success rate could be increased to 100% [ 78 ]. In 2007, Waitayawinyu et al. study found better results with collagen conduits than with PGA conduits [ 79 ]. Our statistical results showed that there was no significant difference between the two catheters. Vein graft and muscle-in-vein graft as autografts also needed to be obtained from the donor site, but they were not as damaging to the donor site as autologous nerves. The two surgical methods had equivalent results, and there was no absolute advantage when compared with other methods. For large-segment defects or proximal nerve damage, the end-to-side anastomosis technique was an effective method. Its excellent rate was the highest among the 8 methods. Experimental end-to-side nerve suture was first introduced by Kennedy [ 80 ], but somehow it was not widely used clinically then. Viterbo et al., the creators of the modern approach of end-to-side neurorrhaphy without harming the donor’s nerve, something that broke paradigm, against all acknowledges, conducted their research by rats, in which they had the peroneal nerve sectioned, the distal ending sutured to the lateral face of the tibial nerve after removing a small epineural window, demonstrating that the anastomosed nerve endings had electrophysiological functions and successfully proving that the end-to-side nerve anastomosis technique was feasible [ 81 , 82 , 83 ]. Mennen first reported the use of this technique in humans in 1996 with good results [ 84 ]. In the 2003 literature, Mennen reported 56 cases of end-to-side anastomosis, including 5 cases of digital nerve repair, with a good level of neurological functional recovery [ 7 ]. Since then, four other scholars have reported related studies, but the number of cases they reported was very small. Recently, new techniques and materials have been used as variants for end-to-side coaptation; however, Geuna S et al. proposed that the bioactive materials as conduits or gene therapy, the role of Schwann cells, and attracting factors derived from the severed trunk should be on the way with further studies [ 85 ]. As a new surgical method of nerve repair, there are few studies on the repair of digital nerve. A total of 5 articles [ 7 , 37 , 64 , 70 , 86 ] and 49 cases were included in our study, and some data could not be extracted. Thus, there may be publication bias.

The data on the excellent rate of SWMF and M2PD of the autograft (muscle-in-vein graft/vein graft) were the worst. These 2 techniques have disadvantages for longer distances such as the collapse of the vein or dispersion of the regenerating axons out of the muscle [ 47 ]. We found that none of these methods had significantly different results. Our results were similar as shown in the meta-analysis performed by [ 11 , 12 , 13 ].

Through a summary analysis of all the data in the 4 outcome measures, we found that most patients had a good recovery after nerve injury repair. According to the modified Highet classification of nerve recovery, both S2PD and M2PD achieved S3 + or better. The Highet score and SWMF excellent and good rate were all above 80% (Table 1 ). We found that surgical repair was significantly better than no repair. Our results are consistent with the study performed by Chow et al., which had the same conclusion. [ 56 ] In Chow’s literature, 2-year follow-up outcomes were compared between digital nerve repair and no repair. 90% of the 76 patients with nerve repair achieved S3 + or better at 2 years, compared with only 6% of the 36 patients with unrepaired digital nerves. On the other hand, the meta-analysis of Dunlop et al. found that there were little difference between repair and non-repair. The differences in conclusions may be due to different studies included in the analysis [ 3 ].

The surgical approach significantly impacts nerve injury and is a critical factor in surgical intervention. The mechanism of injury is another important factor that may affect the degree of damage, the length of nerve defect, the choice of the surgical method, and the outcome of postoperative recovery. Many scholars have researched this factor in the literature included in our study. Kusuhara et al.’s nine studies [ 8 , 18 , 21 , 33 , 43 , 52 , 60 , 72 , 74 ] suggested that the type of injury had an impact on postoperative neurological recovery. Schmauss et al.’s nine studies [ 1 , 9 , 34 , 45 , 57 , 63 , 66 , 73 , 75 ] reported that the type of injury did not affect nerve recovery. We also did a statistical analysis of the data for this factor; through further screening of the included literature, we analyzed 25 kinds of literature with specific injury descriptions. Regarding S2PD value, sharp injury recovered better than blunt injury after four types of surgery, but there was no apparent absolute advantage. In terms of the excellent and reasonable rate, sharp injury has apparent benefits in the recovery of blunt injury after neurorrhaphy, and there is no significant difference between the other three surgical methods. This should be related to the fact that blunt injury can lead to large nerve damage, so only conduit or nerve transplantation can be selected for treatment. After the damaged nerve segment is removed, the nerve stumps become healthy. At this time, there is no significant difference in the effect of the two injury mechanisms on the nerve. However, if the damaged nerve segment is not resected but directly anastomosed, the blunt injury of the nerve is unhealthy and will affect the postoperative recovery. Sharp injury has less damage to the nerve, and the recovery effect after neurorrhaphy is good, while the blunt injury is poor. Therefore, when dealing with blunt nerve injury, the damaged nerve segment should be removed, and the appropriate surgical method should be selected according to the length of the nerve defect.

There are other factors that may affect the postoperative recovery of neuroremediation. In the 5 studies included, it has been shown that age was a factor that affected nerve recovery, especially in children, whose recovery after nerve repair was better than that of adults and the elderly [ 1 , 33 , 34 , 36 , 74 ]. Repair time, smoking, and follow-up time may have little effect on the recovery after nerve repair. In 2015, a study by Fakin et al. found that the experience of the surgeon was also one of the predicting factors of the outcomes. The repair of the finger artery accompanying the finger nerve had little effect on the postoperative recovery, which was also concluded by Hohendorff et al. [ 63 , 87 ] In 1985, Sullivan et al. and Murakami et al. found that the number of finger nerve repairs had no difference in the effect of restoration [ 35 , 88 ]. In a 2016 study done by Bulut et al., it was found that the recovery after finger nerve injury repair was independent of gender and which finger [ 73 ]. In 1981, Young et al. compared simple epineurium repair versus perineurium repair, and there was no significant difference in the recovery [ 55 ]. In a 2016 study by Sladana et al., it was deemed necessary to use splints after nerve repair [ 72 ]. Thomas et al. found that the result of using a microscope was significantly better than using a magnifying glass [ 89 ].

Our analysis of the postoperative complications in the included literature found that neuroma, cold stimulation, paresthesia, and pain were the most reported after autograft surgeries. This may be due to the damage to the donor site and poor recovery of the recipient site after transplantation. For complications, the application of allogeneic nerves and nerve conduits was better than autograft.

Our analysis has shown that the length of the nerve defect would affect the postoperative recovery, as well as limit the choice of surgical methods. Of course, we must also consider other factors, such as complications, economic conditions, local hospital technology, repair materials, etc. When there were multiple options to choose from for the optimal repair gap, we had to consider clinical factors associated with recovery when making the decision. There were no significant differences in the outcomes of various surgical methods, and the surgeon should choose a reasonable treatment plan based on the clinical scenario.

There were several limitations of our study. First, the quality of our study is limited by the quality of the included studies, which were mostly case series (level 4 evidence). Second, the strength of our conclusions was limited by the heterogeneous and incomplete outcome data reported across the included studies, and publication bias for the individual studies analyzed. In addition, when analyzing the excellent rate of Highet score, not every study reported outcomes in the same manner. We were forced to use S2PD and M2PD classification systems to group the results into categories that were comparable across sensory outcomes.


Our study demonstrated that the results of surgical treatment of digital nerve injury are generally satisfactory; however, no nerve repair method has absolute advantages. When choosing a surgical method to repair finger nerve injury, we must comprehensively consider various factors, especially the type of injury, the gap size of the nerve defect, the injury to the patient’s donor site, postoperative complications, the patient’s economic conditions, and the medical level of the local hospital. Whenever tension-free nerve coaptation was possible, end-to-end nerve coaptation was still the method of choice. In the case of nerve defects, the advantages of nerve conduits and allogeneic nerves were relatively high. When the proximal nerve was damaged and could not be connected, the end-to-side anastomosis technique could be selected for bridging to repair. Simultaneously, age, the size of the gap, and the type of injury were also factors that may affect recovery. Certainly, in consideration of the limitations of the study, such as the low qualities, the high heterogeneous, incomplete outcome data reported, and publication bias for the individual studies, conclusions in our study should be interpreted with caution. Therefore, more high-quality randomized controlled studies were definitely needed in order to give a conclusive statement.

Availability of data and materials

This study included articles which are available via PubMed. All information analyzed in this study was collected in a data set, and this is available from the corresponding author on reasonable request.


Static 2-point discrimination

Moving 2-point discrimination

Semmes–Weinstein monofilament testing

Polyglycolic acid tubes

Preferred Reporting Items for Systematic Reviews and Meta-Analyses

Confidence intervals

JBI Meta-Analysis of Statistics Assessment and Review Instrument

Australia’s Joanna Briggs Institute

Efstathopoulos D, Gerostathopoulos N, Misitzis D, Bouchlis G, Anagnostou S, Daoutis NK. Clinical assessment of primary digital nerve repair. Acta Orthop Scand Suppl. 1995;264:45–7.

Article   CAS   PubMed   Google Scholar  

Herman ZJ, Ilyas AM. Sensory outcomes in digital nerve repair techniques: an updated meta-analysis and systematic review. Hand. 2019;15(2):157–64.

Article   PubMed   PubMed Central   Google Scholar  

Dunlop RLE, Wormald JCR, Jain A. Outcome of surgical repair of adult digital nerve injury: a systematic review. BMJ Open. 2019;9(3):e025443.

Griffin JW, Hogan MV, Chhabra AB, Deal DN. Peripheral nerve repair and reconstruction. J Bone Jt Surg Am. 2013;95(23):2144–51.

Article   Google Scholar  

Paprottka FJ, Wolf P, Harder Y, Kern Y, Paprottka PM, Machens HG, Lohmeyer JA. Sensory recovery outcome after digital nerve repair in relation to different reconstructive techniques: meta-analysis and systematic review. Plast Surg Int. 2013;2013: 704589.

PubMed   PubMed Central   Google Scholar  

Staniforth P, Fisher TR. The effects of sural nerve excision in autogenous nerve grafting. Hand. 1978;10(2):187–90.

Mennen U. End-to-side nerve suture in clinical practice. Hand Surg Int J Devoted Hand Upper Limb Surg Relat Res J Asia-Pac Fed Soc Surg Hand. 2003;8(1):33–42.

Google Scholar  

Kusuhara H, Hirase Y, Isogai N, Sueyoshi Y. A clinical multi-center registry study on digital nerve repair using a biodegradable nerve conduit of PGA with external and internal collagen scaffolding. Microsurgery. 2019;39(5):395–9.

Article   PubMed   Google Scholar  

Schmauss D, Finck T, Liodaki E, Stang F, Megerle K, Machens HG, Lohmeyer JA. Is nerve regeneration after reconstruction with collagen nerve conduits terminated after 12 months? The long-term follow-up of two prospective clinical studies. J Reconstr Microsurg. 2014;30(8):561–8.

Mauch JT, Bae A, Shubinets V, Lin IC. A systematic review of sensory outcomes of digital nerve gap reconstruction with autograft, allograft, and conduit. Ann Plast Surg. 2019;82:S247–55.

Rinkel WD, Huisstede BM, van der Avoort DJ, Coert JH, Hovius SE. What is evidence based in the reconstruction of digital nerves? A systematic review. J Plast Reconstr Aesthet Surg. 2013;66(2):151–64.

Kim JS, Bonsu N-Y, Leland HA, Carey JN, Patel KM, Seruya M. A systematic review of prognostic factors for sensory recovery after digital nerve reconstruction. Ann Plast Surg. 2018;80:S311–6.

Mermans JF, Franssen BB, Serroyen J, Van der Hulst RR. Digital nerve injuries: a review of predictors of sensory recovery after microsurgical digital nerve repair. Hand (N Y). 2012;7(3):233–41.

Smith JW. Microsurgery of peripheral nerves. Plast Reconstr Surg. 1964;33:317–29.

Imai H, Tajima T, Natsuma Y. Interpretation of cutaneous pressure threshold (Semmes–Weinstein monofilament measurement) following median nerve repair and sensory reeducation in the adult. Microsurgery. 1989;10(2):142–4.

Bushnell BD, McWilliams AD, Whitener GB, Messer TM. Early clinical experience with collagen nerve tubes in digital nerve repair. J Hand Surg. 2008;33(7):1081–7.

Taras JS, Jacoby SM, Lincoski CJ. Reconstruction of digital nerves with collagen conduits. J Hand Surg Am. 2011;36(9):1441–6.

Arnaout A, Fontaine C, Chantelot C. Sensory recovery after primary repair of palmar digital nerves using a Revolnerv ((R)) collagen conduit: a prospective series of 27 cases. Chir Main. 2014;33(4):279–85.

Thomsen L, Bellemere P, Loubersac T, Gaisne E, Poirier P, Chaise F. Treatment by collagen conduit of painful post-traumatic neuromas of the sensitive digital nerve: a retrospective study of 10 cases. Chir Main. 2010;29(4):255–62.

Means KR, Rinker BD, Higgins JP, Payne SH, Merrell GA, Wilgis EFS. A multicenter, prospective, randomized, pilot study of outcomes for digital nerve repair in the hand using hollow conduit compared with processed allograft nerve. Hand. 2016;11(2):144–51.

Rbia N, Bulstra LF, Saffari TM, Hovius SER, Shin AY. Collagen nerve conduits and processed nerve allografts for the reconstruction of digital nerve gaps: a single-institution case series and review of the literature. World Neurosurg. 2019;127:e1176–84.

Guo Y, Chen G, Tian G, Tapia C. Sensory recovery following decellularized nerve allograft transplantation for digital nerve repair. J Plast Surg Hand Surg. 2013;47:1–3.

Taras JS, Amin N, Patel N, McCabe LA. Allograft reconstruction for digital nerve loss. J Hand Surg Am. 2013;38(10):1965–71.

Karabekmez FE, Duymaz A, Moran SL. Early clinical outcomes with the use of decellularized nerve allograft for repair of sensory defects within the hand. Hand (N Y). 2009;4(3):245–9.

Tos P, Battiston B, Ciclamini D, Geuna S, Artiaco S. Primary repair of crush nerve injuries by means of biological tubulization with muscle-vein-combined grafts. Microsurgery. 2012;32(5):358–63.

Risitano G, Cavallaro G, Merrino T, Coppolino S, Ruggeri F. Clinical results and thoughts on sensory nerve repair by autologous vein graft in emergency hand reconstruction. Chir Main. 2002;21(3):194–7.

Alligand-Perrin P, Rabarin F, Jeudy J, Cesari B, Saint-Cast Y, Fouque PA, Raimbeau G. Vein conduit associated with microsurgical suture for complete collateral digital nerve severance. Orthop Traumatol Surg Res. 2011;97(4 Suppl):S16-20.

Laveaux C, Pauchot J, Obert L, Choserot V, Tropet Y. Retrospective monocentric comparative evaluation by sifting of vein grafts versus nerve grafts in palmar digital nerves defects. Report of 32 cases. Ann Chir Plast Esthet. 2010;55(1):19–34.

Chen C, Tang P, Zhang X. Reconstruction of proper digital nerve defects in the thumb using a pedicle nerve graft. Plast Reconstr Surg. 2012;130(5):1089–97.

Pilanci O, Ozel A, Basaran K, Celikdelen A, Berkoz O, Saydam FA, Kuvat SV. Is there a profit to use the lateral antebrachial cutaneous nerve as a graft source in digital nerve reconstruction? Microsurgery. 2014;34(5):367–71.

Inoue S, Ogino T, Tsutida H. Digital nerve grafting using the terminal branch of posterior interosseous nerve: a report of three cases. Hand Surg Int J Devot Hand Upper Limb Surg Relat Res J Asia-Pac Fed Soc Surg Hand. 2002;7(2):305–7.

Meek MF, Coert JH, Robinson PH. Poor results after nerve grafting in the upper extremity: Quo vadis? Microsurgery. 2005;25(5):396–402.

Al-Ghazal SK, McKiernan M, Khan K, McCann J. Results of clinical assessment after primary digital nerve repair. J Hand Surg (Edinburgh, Scotland). 1994;19(2):255–7.

Article   CAS   Google Scholar  

Altissimi M, Mancini GB, Azzarà A. Results of primary repair of digital nerves. J Hand Surg (Edinburgh, Scotland). 1991;16(5):546–7.

Sullivan DJ. Results of digital neurorrhaphy in adults. J Hand Surg (Edinburgh, Scotland). 1985;10(1):41–4.

Segalman KA, Cook PA, Wang BH, Theisen L. Digital neurorrhaphy after the age of 60 years. J Reconstr Microsurg. 2001;17(2):85–8.

Voche P, Ouattara D. End-to-side neurorrhaphy for defects of palmar sensory digital nerves. Br J Plast Surg. 2005;58(2):239–44.

Pereira JH, Bowden RE, Gattuso JM, Norris RW. Comparison of results of repair of digital nerves by denatured muscle grafts and end-to-end sutures. J Hand Surg (Edinburgh, Scotland). 1991;16(5):519–23.

Mackinnon SE, Dellon AL. Clinical nerve reconstruction with a bioabsorbable polyglycolic acid tube. Plast Reconstr Surg. 1990;85(3):419–24.

Rinker B, Liau JY. A prospective randomized study comparing woven polyglycolic acid and autogenous vein conduits for reconstruction of digital nerve gaps. J Hand Surg. 2011;36(5):775–81.

Battiston B, Geuna S, Ferrero M, Tos P. Nerve repair by means of tubulization: literature review and personal clinical experience comparing biological and synthetic conduits for sensory nerve repair. Microsurgery. 2005;25(4):258–67.

Neubrech F, Heider S, Otte M, Hirche C, Kneser U, Kremer T. Nerve tubes for the repair of traumatic sensory nerve lesions of the hand: review and planning study for a randomised controlled multicentre trial. Handchir Mikrochir Plast Chir. 2016;48(3):148–54.

CAS   PubMed   Google Scholar  

Lohmeyer JA, Kern Y, Schmauss D, Paprottka F, Stang F, Siemers F, Mailaender P, Machens HG. Prospective clinical study on digital nerve repair with collagen nerve conduits and review of literature. J Reconstr Microsurg. 2014;30(4):227–34.

PubMed   Google Scholar  

Lohmeyer J, Zimmermann S, Sommer B, Machens HG, Lange T, Mailander P. Bridging peripheral nerve defects by means of nerve conduits. Chirurg. 2007;78(2):142–7.

Buncke G, Safa B, Thayer W, Greenberg J, Ingari J, Rinker B. Outcomes of short-gap sensory nerve injuries reconstructed with processed nerve allografts from a multicenter registry study. J Reconstr Microsurg. 2015;31(05):384–90.

Rinker B, Zoldos J, Weber RV, Ko J, Thayer W, Greenberg J, Leversedge FJ, Safa B, Buncke G: Use of Processed Nerve Allografts to Repair Nerve Injuries Greater Than 25 mm in the Hand. Ann Plast Surg 2017, 78(6S Suppl 5):S292-s295.

Marcoccio I, Vigasio A. Muscle-in-vein nerve guide for secondary reconstruction in digital nerve lesions. J Hand Surg Am. 2010;35(9):1418–26.

Norris RW, Glasby MA, Gattuso JM, Bowden RE. Peripheral nerve repair in humans using muscle autografts. A new technique. J Bone Jt Surg Br. 1988;70(4):530–3.

Laveaux C, Pauchot J, Obert L, Choserot V, Tropet Y. Emergency management of traumatic collateral palmar digital nerve defect inferior to 30 mm by venous grafting. Report on 12 clinical cases. Chir Main. 2011;30(1):16–9.

Lee Y-H, Shieh S-J. Secondary nerve reconstruction using vein conduit grafts for neglected digital nerve injuries. Microsurgery. 2008;28(6):436–40.

Tang JB, Gu YQ, Song YS. Repair of digital nerve defect with autogenous vein graft during flexor tendon surgery in zone 2. J Hand Surg (Edinburgh, Scotland). 1993;18(4):449–53.

Walton RL, Brown RE, Matory WE Jr, Borah GL, Dolph JL. Autogenous vein graft repair of digital nerve defects in the finger: a retrospective clinical study. Plast Reconstr Surg. 1989;84(6):944–9 ( discussion 950–942 ).

He B, Zhu Q, Chai Y, Ding X, Tang J, Gu L, Xiang J, Yang Y, Zhu J, Liu X. Safety and efficacy evaluation of a human acellular nerve graft as a digital nerve scaffold: a prospective, multicentre controlled clinical trial. J Tissue Eng Regen Med. 2015;9(3):286–95.

Chiu DT, Strauch B. A prospective clinical evaluation of autogenous vein grafts used as a nerve conduit for distal sensory nerve defects of 3 cm or less. Plast Reconstr Surg. 1990;86(5):928–34.

Young L, Wray RC, Weeks PM. A randomized prospective comparison of fascicular and epineural digital nerve repairs. Plast Reconstr Surg. 1981;68(1):89–93.

Chow SP, Ng C. Can a divided digital nerve on one side of the finger be left unrepaired? J Hand Surg (Edinburgh, Scotland). 1993;18(5):629–30.

Calcagnotto GN, Braga Silva J. The treatment of digital nerve defects by the technique of vein conduit with nerve segment. A randomized prospective study. Chir Main. 2006;25(3–4):126–30.

Chen C, Tang P, Zhang X. Finger sensory reconstruction with transfer of the proper digital nerve dorsal branch. J Hand Surg. 2013;38(1):82–9.

Oruç M, Ozer K, Çolak Ö, Kankaya Y, Koçer U. Does crossover innervation really affect the clinical outcome? A comparison of outcome between unilateral and bilateral digital nerve repair. Neural Regen Res. 2016;11(9):1499–505.

Wang WZ, Crain GM, Baylis W, Tsai TM. Outcome of digital nerve injuries in adults. J Hand Surg Am. 1996;21(1):138–43.

Young VL, Wray RC, Weeks PM. The results of nerve grafting in the wrist and hand. Ann Plast Surg. 1980;5(3):212–5.

Stang F, Stollwerck P, Prommersberger KJ, van Schoonhoven J. Posterior interosseus nerve vs. medial cutaneous nerve of the forearm: differences in digital nerve reconstruction. Arch Orthop Trauma Surg. 2013;133(6):875–80.

Fakin RM, Calcagni M, Klein HJ, Giovanoli P. Long-term clinical outcome after epineural coaptation of digital nerves. J Hand Surg (Eur Vol). 2015;41(2):148–54.

Artiaco S, Tos P, Conforti LG, Geuna S, Battiston B. Termino-lateral nerve suture in lesions of the digital nerves: clinical experience and literature review. J Hand Surg Eur. 2010;35(2):109–14.

McFarlane RM, Mayer JR. Digital nerve grafts with the lateral antebrachial cutaneous nerve. J Hand Surg. 1976;1(3):169–73.

Poppen NK, McCarroll HR, Doyle JR, Niebauer JJ. Recovery of sensibility after suture of digital nerves. J Hand Surg. 1979;4(3):212–26.

Chevrollier J, Pedeutour B, Dap F, Dautel G. Evaluation of emergency nerve grafting for proper palmar digital nerve defects: a retrospective single centre study. Orthop Traumatol Surg Res. 2014;100(6):605–10.

Rose EH, Kowalski TA, Norris MS. The reversed venous arterialized nerve graft in digital nerve reconstruction across scarred beds. Plast Reconstr Surg. 1989;83(4):593–604.

Kim J, Lee YH, Kim MB, Lee SH, Baek GH. Innervated reverse digital artery island flap through bilateral neurorrhaphy using direct small branches of the proper digital nerve. Plast Reconstr Surg. 2015;135(6):1643–50.

Landwehrs GM, Brüser P. Clinical results of terminolateral neurorrhaphy in digital nerves. Handchir Mikrochir Plast Chir. 2008;40(5):318–21.

Unal MB, Gokkus K, Sirin E, Cansü E. Lateral antebrachial cutaneous nerve as a donor source for digital nerve grafting: a concept revisited. Open Orthop J. 2017;11(1):1041–8.

Andelkovic SZ, Lesic AR, Bumbasirevic MZ, Rasulic LG. The outcomes of 150 consecutive patients with digital nerve injuries treated in a single center. Turk Neurosurg. 2017;27(2):289–93.

Bulut T, Akgun U, Citlak A, Aslan C, Sener U, Sener M. Prognostic factors in sensory recovery after digital nerve repair. Acta Orthop Traumatol Turc. 2016;50(2):157–61.

Vahvanen V, Gripenberg L, Nuutinen P. Peripheral nerve injuries of the hand in children. A follow-up study of 38 patients. Scand J Plast Reconstr Surg. 1981;15(1):49–51.

Acar E, Turkmen F, Korucu IH, Karaduman M, Karalezli N. Outcomes of primary surgical repair of zone 2 dDigital nerve injury. Acta Orthop Belg. 2018;84(1):84–93.

Nunley JA, Ugino MR, Goldner RD, Regan N, Urbaniak JR. Use of the anterior branch of the medial antebrachial cutaneous nerve as a graft for the repair of defects of the digital nerve. J Bone Jt Surg Am. 1989;71(4):563–7.

Asplund M, Nilsson M, Jacobsson A, von Holst H. Incidence of traumatic peripheral nerve injuries and amputations in Sweden between 1998 and 2006. Neuroepidemiology. 2009;32(3):217–28.

Laroas G, Battiston B, Sard A, Ferrero M, Dellon AL. Digital nerve reconstruction with the bioabsorbable neurotube. Rivista Italiana di Chirurgia Plastica. 2003;35:125–8.

Waitayawinyu T, Parisi DM, Miller B, Luria S, Morton HJ, Chin SH, Trumble TE. A comparison of polyglycolic acid versus type 1 collagen bioabsorbable nerve conduits in a rat model: an alternative to autografting. J Hand Surg Am. 2007;32(10):1521–9.

Kennedy R. On the restoration of co-ordinated movements after nerve section. Proc R Soc Edinb. 1901;22:636–40.

Viterbo F, Trindade JC, Hoshino K, Mazzoni Neto A. Latero-terminal neurorrhaphy without removal of the epineural sheath. Experimental study in rats. Rev Paul Med. 1992;110(6):267–75.

Viterbo F, Trindade JC, Hoshino K, Mazzoni Neto A. End-to-side neurorrhaphy with removal of the epineurial sheath: an experimental study in rats. Plast Reconstr Surg. 1994;94(7):1038–47.

Viterbo F, Trindade JC, Hoshino K, Mazzoni A. Two end-to-side neurorrhaphies and nerve graft with removal of the epineural sheath: experimental study in rats. Br J Plast Surg. 1994;47(2):75–80.

Mennen U. End-to-side nerve suture in the human patient. Hand Surg. 1998;3(1):7–15.

Geuna S, Papalia I, Ronchi G, d’Alcontres FS, Natsis K, Papadopulos NA, Colonna MR. The reasons for end-to-side coaptation: How does lateral axon sprouting work? Neural Regen Res. 2017;12(4):529–33.

Li Q, Liu Z, Lu J, Shao W, Feng X. Transferring the ulnaris proper digital nerve of index finger and its dorsal branch to repair the thumb nerve avulsion. Zhongguo Xiu Fu Chong Jian Wai Ke Za Zhi. 2017;31(8):992–5.

Hohendorff B, Staub L, Fritsche E, Wartburg U. Sensible Nervenfunktion nach unilateraler digitaler Gefäß-Nerven-Verletzung: Nervennaht mit und ohne Arterienanastomose. Handchir Mikrochir Plast Chir. 2009;41(05):306–11.

Murakami T, Ikuta Y, Tsuge K. Relationship between the number of digital nerves sutured and sensory recovery in replanted fingers. J Reconstr Microsurg. 1985;1(4):283–6.

Thomas PR, Saunders RJ, Means KR. Comparison of digital nerve sensory recovery after repair using loupe or operating microscope magnification. J Hand Surg Eur. 2015;40(6):608–13.

Download references


Not applicable.

Author information

Authors and affiliations.

Department of Orthopedics, First Affiliated Hospital of Dalian Medical University, Dalian, 116011, Liaoning, China

Yi Zhang, Jian Zhang, Jiahui Liang, Xiaohu Chang & Xin Tang

Department of Hand and Foot Surgery, Zibo Central Hospital, No. 54 Gongqingtuan West Road, Zibo, Shandong, China

Yi Zhang, Nianzong Hou & Bing Xie

Department of Critical Care Medicine, Zibo Central Hospital, No. 54 Gongqingtuan West Road, Zibo, Shandong, China

Center of Gallbladder Disease, Shanghai East Hospital, Institute of Gallstone Disease, School of Medicine, Tongji University, Shanghai, China

Nianzong Hou

You can also search for this author in PubMed   Google Scholar


XT and YZ contributed to conception and design of the study, literature search, data extraction, methodological quality assessment, writing, and final approval; and XT and KW were involved in literature search, data extraction, methodological quality assessment, analysis, interpretation of data, and final approval; NH and JZ contributed to revision and final approval, and BX, JL, and XC were involved in supervision and final approval. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Kai Wang or Xin Tang .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit . The Creative Commons Public Domain Dedication waiver ( ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Zhang, Y., Hou, N., Zhang, J. et al. Treatment options for digital nerve injury: a systematic review and meta-analysis. J Orthop Surg Res 18 , 675 (2023).

Download citation

Received : 25 April 2023

Accepted : 04 August 2023

Published : 12 September 2023


Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Digital nerve
  • Digital nerve injury
  • Digital nerve repair
  • Digital nerve reconstruction
  • Digital nerve gap repair

Journal of Orthopaedic Surgery and Research

ISSN: 1749-799X

research analysis and evaluation

Research, Analysis and Evaluation

CNA specializes in translating academic research into practice in the field. By harnessing research findings, sophisticated analysis and a robust understanding of systems, we encourage our nation’s criminal justice leaders to approach problems differently, with an evidence base that promotes safety, effectiveness, and efficiency. Our work includes a variety of research studies and after-action reviews.

Getting Back to Normal: Informing Renormalization of COVID-19

CNA customized SAFER-C to simulate the operations, environment, and virus spread within a representative 100-bed housing unit. The detailed insights gained from this analysis suggest that SAFER-C is a valuable tool for correctional decision-makers.

The Use and Effectiveness of Safety Equipment in Correctional Facilities Across the United States

CNA and the Association of State Correctional Administrators and other partners collaborated with seven facilities to gather information about officer injuries and the use of safety equipment, and examining policies and procedures related to safety and safety equipment.

Field Training Programs in Law Enforcement

This report highlights the perceptions about field training programs from the perspectives of both the field training officers and the trainees.

Impact on Field Training Incentives

Impacts on field training programs: pairing trainers and trainees, impacts of field training programs: recruitment and retention, philadelphia police department's response to demonstrations and civil unrest.

CNA’s independent after-action report provided an independent review of the Philadelphia Police Department's response to the 2020 mass demonstrations and civil unrest.

  • The Benefits of Body-Worn Cameras


This report provides insights on how body-worn cameras used by officers in the Las Vegas Police Department improved job performance, were cost-effective and supported community relations.

  • Justice Solutions
  • Navigating Retaliatory Violent Disputes
  • Listening Session 1: Juvenile Justice System Crime Analysts
  • Listening Session 2: Juvenile Defenders on COVID-19 Policies for the Long-Term
  • Listening Session 3: State Juvenile Justice Agency Administrators
  • Listening Session 6: Juvenile Court Judges
  • Work and Life Stressors of Law Enforcement Personnel
  • Opioid Data Initiative
  • Common Operational Picture Technology in Law Enforcement: Three Case Studies
  • Las Vegas After-Action Assessment

Evaluation Research Analysis

01 Jul 2024 - 10 Nov 2024

PSYC511 or PSYCH511

PSYC510, PSYC512 and PSYC513

This paper provides an introduction to evaluation praxis with a major focus on completing a small scale evaluation of a social service or health programme. Such roles as consultant, advocate, liaison and technician are part of the skills students experience. Engagement with the client through refining and negotiating an evaluation plan are part of setting up, collecting and analysing information, and presenting the results of the evaluation in the appropriate format(s). Students are expected to be active learners and take the lead responsibility of reporting evaluation progress to the client (which involves undertaking agreed tasks on time and reporting back on them). Emphasis is placed on qualitative methods, collaborative approaches and evaluation as a strategy of incremental social change.

Teaching Periods and Locations

If your paper outline is not linked below, try the previous year's version of this paper .

Timetabled lectures and tutorials

Indicative fees.

You will be sent an enrolment agreement which will confirm your fees. Tuition fees shown are indicative only and may change. There are additional fees and charges related to enrolment - please see the  Table of Fees and Charges for more information.

Available subjects

Additional information.

Subject regulations

  • Paper details current as of 27 Jan 2024 23:55pm
  • Indicative fees current as of 9 Apr 2024 01:30am

You’re viewing this website as a domestic student

You’re currently viewing the website as a domestic student, you might want to change to international.

You're a domestic student if you are:

  • A citizen of New Zealand or Australia
  • A New Zealand permanent resident

You're an International student if you are:

  • Intending to study on a student visa
  • Not a citizen of New Zealand or Australia
  • Open supplemental data
  • Reference Manager
  • Simple TEXT file

People also looked at

Original research article, designing for usability: development and evaluation of a portable minimally-actuated haptic hand and forearm trainer for unsupervised stroke rehabilitation.

research analysis and evaluation

  • 1 Motor Learning and Neurorehabilitation Laboratory, ARTORG Center for Biomedical Engineering Research, University of Bern, Bern, Switzerland
  • 2 Department of Cognitive Robotics, Delft University of Technology, Delft, Netherlands
  • 3 Department of Rehabilitation Medicine, Erasmus MC, University Medical Center Rotterdam, Rotterdam, Netherlands
  • 4 Rijndam Rehabilitation Center, Rotterdam, Netherlands

In stroke rehabilitation, simple robotic devices hold the potential to increase the training dosage in group therapies and to enable continued therapy at home after hospital discharge. However, we identified a lack of portable and cost-effective devices that not only focus on improving motor functions but also address sensory deficits. Thus, we designed a minimally-actuated hand training device that incorporates active grasping movements and passive pronosupination, complemented by a rehabilitative game with meaningful haptic feedback. Following a human-centered design approach, we conducted a usability study with 13 healthy participants, including three therapists. In a simulated unsupervised environment, the naive participants had to set up and use the device based on written instructions. Our mixed-methods approach included quantitative data from performance metrics, standardized questionnaires, and eye tracking, alongside qualitative feedback from semi-structured interviews. The study results highlighted the device's overall ease of setup and use, as well as its realistic haptic feedback. The eye-tracking analysis further suggested that participants felt safe during usage. Moreover, the study provided crucial insights for future improvements such as a more intuitive and comfortable wrist fixation, more natural pronosupination movements, and easier-to-follow instructions. Our research underscores the importance of continuous testing in the development process and offers significant contributions to the design of user-friendly, unsupervised neurorehabilitation technologies to improve sensorimotor stroke rehabilitation.

1 Introduction

Stroke is one of the main contributors to disability worldwide and its impact on society is expected to further increase in the future with aging populations ( Feigin et al., 2022 ). After stroke, the loss of upper-limb functions such as grasping and fine manipulation is particularly prevalent ( Lai et al., 2002 ; Kwakkel et al., 2003 ; Zbytniewska-Mégret et al., 2023 ) and affects the autonomy and quality of life of patients ( Mercier et al., 2001 ).

To maximize therapy outcomes, patients should undergo an intense and high-dosage (i.e., large number of repetitions and long overall training duration) neurorehabilitation program ( Kwakkel et al., 2004 ; Schneider et al., 2016 ; Ward et al., 2019 ; Tollár et al., 2021 ). Unfortunately, the dosage and intensity of current interventions that rely on one-to-one interactions between therapists and patients are often considerably lower than recommended due to organizational constraints and limited resources in the healthcare system ( Camillieri, 2019 ). This situation is expected to be further aggravated in the near future by the increasing financial pressure in healthcare and the global clinical staff shortage ( Haakenstad et al., 2022 ). Group therapy and home rehabilitation are two approaches that could mitigate this societal challenge. Group therapy can be as effective as dose-matched individual therapy ( Renner et al., 2016 ), while home-based rehabilitation provides the flexibility of location and can maintain therapy dosage following clinical discharge if exercises are frequently performed according to plan and executed correctly ( Hackett et al., 2002 ; Cramer et al., 2019 ; Chi et al., 2020 ). It has been suggested that sustained training at home might even be a prerequisite for minimizing subsequent losses in patients' quality of life ( Tollár et al., 2023 ). Thus, a paradigm shift is needed from conventional on-site labor-intensive motor rehabilitation to minimally supervised motor rehabilitation at the patient's home.

However, both group therapy and at-home rehabilitation require adequate tools that deliver high-dosage training and ensure patient engagement in minimally supervised (group therapy) or unsupervised environments (home rehabilitation). Sensorized or robotic devices in combination with interactive gamified exercises are promising candidates to drive this paradigm shift ( Chen et al., 2019 ; Handelzalts et al., 2021 ; Lambercy et al., 2021 ; Forbrigger et al., 2023a ). The effectiveness of robot-based group therapy in providing high dosage high-intensity therapy has already been demonstrated (e.g., Hesse et al., 2014 ), while the feasibility of sensorized or robotic devices to deliver high dosage in self-guided therapies at home has been largely endorsed, (e.g., Sivan et al., 2014 ; Wittmann et al., 2016 ; Hyakutake et al., 2019 ; McCabe et al., 2019 ; Rozevink et al., 2021 ). Notably, there is even evidence that technology-based rehabilitation programs can outperform conventional (i.e., non-interactive exercises according to paper instructions) home rehabilitation ( Wilson et al., 2021 ; Swanson et al., 2023 ).

Among current technological solutions for unsupervised home rehabilitation, we can find non-actuated devices like the Armeo ® Senso (Hocoma AG, Switzerland), the FitMi (Flint Rehab, USA) or MERLIN ( Guillén-Climent et al., 2021 ). These sensorized devices typically track the patient's movements using sensors such as inertial measurement units (IMUs) or sensorized wheels (e.g., rotary encoders), allowing the patients' movements to be used as inputs for gamified exercises on a tablet, computer, or smartphone. Some devices, like the Gripable (GripAble Limited, United Kingdom), Pablo ® (TyroMotion GmbH, Austria), or the NeuroBall TM (Neurofenix, USA) additionally feature sensors to detect grip strength. While non-actuated devices have shown their feasibility to deliver high dosage in self-guided therapies at home ( Wittmann et al., 2016 ; Rozevink et al., 2021 ), they are limited in their capabilities to actively support or resist patients' hand movements. This is overcome with actuated robotic devices such as the PoRi, a compact hand-held device with one actuated degree of freedom (DoF) for grasping that includes haptic feedback vibro-tactile actuators ( Wolf et al., 2022 ). Other examples include the hCAAR, a robotic device for planar movements in the transversal plane ( Sivan et al., 2014 ), or the Motus Hand, a commercial device for wrist flexion/extension ( Wolf et al., 2015 ). Yet, while all these solutions seemed to be well suited for minimally supervised or unsupervised training, except for PoRi, they mostly target motor functions and neglect the training of somatosensory functions.

The execution of skillful movements relies on the integration of meaningful sensory information such as touch and proprioception ( Scott, 2004 ; Pettypiece et al., 2010 ) and the provision of such information during training is therefore highly recommended ( Bolognini et al., 2016 ; Handelzalts et al., 2021 ). In robotic training, such sensory information can be provided through haptic rendering—i.e., the generation of physical forces from interactions with tangible virtual objects ( Gassert and Dietz, 2018 ). Although multiple robotic rehabilitation devices have specifically been developed to address this (e.g., Metzger et al. 2011 ; Fong et al. 2017 ; Rätz et al. 2021a ), they are mostly intended for clinical rehabilitation. To our best knowledge, the recent ReHandyBot (Articares Pte Ltd, Singapore)—a commercial device based on the haptic tabletop device HandyBot ( Ranzani et al., 2023 ) with two DoF, i.e., grasping and pronosupination—is currently the only commercial portable upper-limb device intended for home use which was explicitly designed to also address sensory deficits.

Commercial devices also remain costly, limiting their adoption both for group therapy and at-home rehabilitation. Cost-effectiveness was listed as one of the driving reasons for not recommending robot-assisted neurorehabilitation in adult post-stroke training by the United Kingdom National Institute for Health and Care Excellence guidelines in October 2023 ( NICE, 2023 ). These costs were not only associated with the device purchase (first investment) but also with maintaining the equipment, the staff time for setting up the machine for each use, and time to teach the patient how to use it. Further, it was noted that machines were only used in a small subset of patients and so could not be used at their full capacity, increasing the cost per use and so overall intervention costs. We thus think that there is a further need for low-cost, versatile, intuitive, and highly portable hand rehabilitation devices that provide meaningful haptic feedback for use in minimally supervised or unsupervised settings.

Here, we present the design and results from a first usability test of our second prototype of a portable and low-cost haptic hand trainer based on a novel compliant shell mechanism. The device offers two degrees of freedom: An actuated one for grasping as well as haptic rendering, and a passive one for pronosupination movements. Meeting the stringent criteria for home rehabilitation devices is challenging ( Chen et al., 2019 ; Forbrigger et al., 2023b ). It has been shown that usability and users' perceptions of assistive devices and rehabilitation technology for home use substantially influence their long-term utilization ( Biddiss and Chau, 2007 ; Sivan et al., 2014 ; Sugawara et al., 2018 ; Ciortea et al., 2021 ). Therefore, we co-created the novel portable device with clinical personnel following a human-centered design approach to ensure efficient and goal-oriented development. For this purpose, we followed four phases when designing our solution: (i) Understand the context of use; (ii) Specify patients' requirements; (iii) Design the solution; and (iv) Evaluate against requirements. Insights from the first two phases are published in Rätz et al. (2021b) and Van Damme et al. (2022) . Here we report on the two later phases. We embraced a mixed-method approach to evaluate the usability of our invention, where quantitative methods such as questionnaires were combined with qualitative approaches like interviews. Hereby, the use of standardized questionnaires was advocated as the results become more meaningful and comparable ( Meyer et al., 2021 ). We included supplementary techniques like video recordings and eye-tracking to help gain further insights into the root causes of usability issues ( Goldberg and Wichansky, 2003 ; Schaarup et al., 2015 ; Maramba et al., 2019 ). For a comprehensive online guide on usability assessment methods, see Meyer et al. (2023) .

The rest of the paper is organized as follows: We first present the development of the second prototype of the novel portable hand trainer as well as the design of an accompanying rehabilitation game including the computation of interaction forces with virtual game objects. This development is the continuation of a concept that we proposed in Van Damme et al. (2022) . We then introduce the setup and methodology of a usability experiment with 13 healthy participants, including three therapists, in a simulated unsupervised scenario. Finally, we present the results and discuss their implications for further developments and studies in patients' homes. This paper evaluates our design choices and offers other device and rehabilitation game developers detailed insights into our learnings.

2 Materials and methods

2.1 device development, 2.1.1 requirements.

The first prototype of our portable hand trainer was developed and patented in 2022 ( Van Damme et al., 2022 ; Rätz et al., 2023 ). The core idea of this first prototype was a U-shaped compliant shell that is grasped with the entire hand—i.e., enclosed with fingers, thumb and palm—and could allow an extremely simple and inherently safe mechanical human-device interaction even in case of improper setup of the user's hand. This shell design mimics a natural large-diameter power grasp, i.e., simultaneous flexion or extension of all fingers with abducted thumb. This particular grasp was selected as it represents one of the most frequently employed hand movements in activities of daily living (ADL) ( Bullock et al., 2013 ) and is effectively trained in clinical rehabilitation ( Pandian et al., 2012 ). Importantly, our first prototype drastically minimized the risk of skin getting pinched in gaps between moving parts regardless of the exact hand proportions of the user, making it an excellent candidate for home rehabilitation. Finally, it also featured a highly-backdrivable transmission and offered good mechanical transparency, allowing for open-loop impedance control to achieve fine haptic rendering. While the initial prototype served as a preliminary proof of concept, a more elaborate version was clearly needed for a first usability study.

For the second generation of the portable hand trainer presented in this study, we defined the following improvements based on first evaluation tests, informal discussions with therapists of the Department of Neurology, University Hospital Bern, Switzerland, and the literature (e.g., Lu et al., 2011 ; Akbari et al., 2021 ; Li et al., 2021 ; Rätz et al., 2021b ): i) The device must be aesthetically pleasing and should look like a medical device. This includes integrating the electronics (except for the power supply and emergency stop) into the device housing. ii) A passive DoF shall be added for pronosupination movements. iii) Although training was feasible for various hand sizes with the first prototype, we found that shorter fingers would benefit from smaller shell sizes. In addition, we also aimed to increase the range of motion of fingers and thumb during grasping. iv) The device must be safe for the ultimate goal is to make it in real unsupervised training. v) The device should remain as compact and portable as possible.

2.1.2 Shell design

The desired bending behavior of the shell during grasping—i.e., following the natural movement of the thumb and the fingers—is achieved by anchoring the shell center part to the device while moving the shell ends on circular paths. For this, the shell ends are connected to a thumb and finger lever ( Figure 1 ). These levers are coupled and actuated through a transmission (see Section 2.1.3).

Figure 1 . Overview of the shell actuation mechanism. Left : Top view of the device and shells with three different sizes. The fixations of the three shells are aligned. The shells are actuated at their ends through levers. Center : Top view of the two-stage transmission mechanism with indicated paths of the shell ends. Right : Bottom view of the transmission. The thumb and finger pulleys have different diameters, d 5 and d 4 respectively, to account for the different ranges of motion of the thumb and fingers. Note the belt clamp, which allows the use of a smooth pulley (with diameter d 4 ). The idler pulley is required for the routing of the synchronous belt around the smooth pulley.

We decided to design three shell sizes (small, medium, large) for this study's device, using the hand measurements of Garrett (1971) . The 5th percentile female hand was considered a small hand, the 95th percentile male hand a large one, and the average of both a medium-sized hand. It is important to note that when closing the hand, the arc length of the inner (i.e., palmar) side of the hand shortens—as easily visible by the skin creases beneath the finger joints—while the thin shell can be assumed to maintain the same arc length when bent. This results in a sliding motion of the fingers along the shell when closing the hand. However, in the first prototype, we found that this sliding is imperceptible to the user and does not pose a problem. In this second prototype, we used this knowledge in the design of three size-specific shell geometries. We designed the size-specific shells such that their ends approximately align with the fingertips (or thumb tip respectively) when the hand is extended ( Figure 3 ). With this and the aforementioned tendency of the fingers to slide backwards on the shell, we know that the fingertips will not collide when the shell is closed. The shell height is the same for the three sizes.

The lengths of the levers that move the shell ends and the locations of their respective center of motion were defined in an iterative process such that (i) the shell ends move along a natural fingertip path, and (ii) there is no collision between mechanical parts. For a quick change of the shells, each shell is mounted with a combination of removable rods at its ends and dowel pins at its center. Thereby, all three shells share a common center fixation, resulting in alignment of the shell center areas that support the thenar web space (i.e., the part between thumb and fingers). This allowed us to use the same wrist fixation for the three shell sizes.

2.1.3 Transmission and control

The device is actuated by an electric DC motor. The transmission is divided into a spur gear stage and a synchronous belt transmission stage. Hereby, the synchronous belt has two functions: First, it amplifies the motor torque, and second, it couples the thumb and the finger movements. Figure 1 shows the mechanism. Note the different diameters of the finger ( d 5 ) and thumb actuation ( d 4 ) pulleys, which allows us to take into account the different ranges of motion of the thumb and the fingers. The arc lengths of the finger and thumb paths are denoted s f and s t respectively, and are computed in Eq. (1) with d 1 , d 2 , d 3 , d 4 , and d 5 being the effective pulley/gear diameters, r f and r t the lever lengths (see Section 2.1.2), and θ m being the angular displacement of the motor shaft.

Applying the principle of virtual work in a static condition, we can thus compute the required motor torque τ m with Eq. (2) , using the partial derivatives of s f and s t with respect to the angular position of the motor shaft θ m and the thumb and fingertip forces F t and F f (i.e., the forces at the shell ends, along the circular path of the shell ends):

If we assume that the thumb and finger forces are equal and given by F = F t = F f , the motor torque is given by Eq. (3) . Note, that this assumption does not necessarily strictly hold during use, as the additional hand-device contacts at the wrist fixation and thenar web space might result in a statically over-constrained situation where unequal forces to thumb and fingers could be applied. However, this assumption is required for the control of the one-DoF shell actuation.

Because the thumb and fingertip movements are coupled, we need to compute their combined displacement s and the speed ṡ with Eq. (4) :

The desired rendered force F is computed with Eq. (5) , depending on the desired visco-elastic characteristics (viscosity B and stiffness K ) of the virtual object/environment the participant may interact with.

We increased the device transparency by compensating the inherent restitution force of the printed shell to allow the fingers to move freely when not in contact with virtual objects. For each shell size, we identified the inherent spring constant and offset by applying a constant motor torque in steps of 3 mNm and measuring the resulting angular motor position (and thus the shell deflection). The compensation torque τ m, c was then computed as the linear regression of the collected data points (see Figure 2 ). This leads to a final motor torque in Eq. (6) .

Figure 2 . Inherent shell restitution force measurements and resulting compensation torque τ m, c computed through linear regression for all three shell sizes.

2.1.4 Final prototype

The final prototype ( Figure 3 ) consists of a housing in which the transmission and electronic components are placed, a wrist rest, a button with integrated status LED, a DC motor, and the shell. The final prototype has a length of 210 mm, a width of 160 mm, and a height of 150 mm. The range of motion for finger and thumb levers ( Figure 1 ) are approximately 80° and 40°, respectively. The majority of the device was 3D-printed in FDA-approved polylactic acid (PLA) plastic, while a few structural parts were printed in carbon-reinforced PLA or machined out of aluminum or stainless steel (e.g., shafts of the pulleys, vertical support rods at the ends of the shells). This results in a weight of 1030 g without the external power supply and emergency stops. For this study, a right-handed version was manufactured. The cost of one prototype unit without external emergency stop buttons is currently slightly below 1,000 CHF (approx. 1,000 EUR).

Figure 3 . (A) Overview of the portable hand trainer. (B) Demonstration of pronosupination movements and flexion/extension of the fingers.

Since the endpoints of the three differently sized shells move on different paths, each shell comes with a distinct cover for the frontal upper part of the device, which is fixated on the device with magnets (front cover in Figure 3A ). A shell can be exchanged in a few seconds by first pulling it vertically off the device (the vertical support rods can either stay in the shell or be removed separately), then exchanging the front cover and inserting the new shell (see video in Supplementary Material ).

The DC motor with an optical encoder (3272CR and IER3 with 4096 pulses, Faulhaber, Germany) is placed vertically outside the main housing and inside the shell, as this allows the most efficient use of space. The resulting maximal continuous force at the fingertips depends on the shell size, 15.7 N (small), 13.6 N (medium), and 11.9 N (large). The device is powered by an external medical-grade 12 V power supply and can be equipped with two emergency stop buttons in series (one for the user and one for a therapist or experimenter). An ESP-32 microcontroller (Espressif Systems, China) with an Escon Module 50/5 motor driver (Maxon, Switzerland) is used to control the device. The embedded control software was written in C++ and is based on the open-source FreeRTOS™ (Amazon, USA) real-time operating kernel, allowing for appropriate scheduling and prioritization of tasks. One of the two cores of the microprocessor runs a dedicated thread for the haptic rendering and motor control at 1 kHz. Games or rehabilitation exercises are executed on a host computer, with which the device communicates through a USB serial connection.

To enable pronosupination movements (i.e., tilting of the device, Figure 3B ), the edges of the bottom of the device were rounded. This allows smooth tilting of the device up to 15° to each side, however, the device can easily be tilted more while leaning on its edge. Because of the remaining flat part in the center, the device is still stable when positioned on a flat surface. The tilting angle is measured with an LSM6DSOX (Adafruit, USA) inertial measurement unit (IMU). A wrist strap was added to fix the user's hand in position and to facilitate pronosupination movements. The size of the wrist strap is adjustable via hook and loop and can be opened with a magnetic lock (Fidlock GmbH, Germany). Another strap was attached onto the distal position on the shell to fixate the fingertips ( Figure 3A ). This strap can be opened with a hook and loop fixation, however, this is not necessarily required as the fingers can simply be sled in.

Before use, the device needs to go through a short calibration step. Upon a three-second press on the device button, a calibration is performed by slowly closing the shell with a proportional-integral (PI) velocity controller and detecting the sudden increase in controller output when the mechanical limit is hit. This calibration step is required to obtain a mapping from the motor shaft angle—measured by an incremental encoder—to the shell endpoint positions, i.e., the opening of the shell. The shell restitution compensation is executed according to the shell size, which must be provided through the host computer prior to the initialization. After completion of the calibration routine, the status LED turns on.

2.2 Serious game

2.2.1 requirements.

We complemented our device with a rehabilitation exercise in the form of a serious game. A preliminary version of the game is described in Van Damme et al. (2022) . Amongst other literature (e.g., Burke et al., 2009 ; Lohse et al., 2013 ; Li et al., 2021 ), we oriented ourselves during the development in the results of a survey that we performed among clinical personnel ( Rätz et al., 2021b ). We specified the following requirements for the game in this study: i) Ensure that the device can effectively showcase its available movements—i.e., power grasp and pronosupination movements. ii) The task in the game should resemble ADL, while still being entertaining. iii) To motivate participants, we aimed to incorporate a challenge-based component, e.g., limited game life, time constraints, and a scoring system. iv) The users shall be encouraged to actively extend their fingers during the exercise. v) The game should contain sub-tasks with different degrees of difficulty. In Rätz et al. (2021b) , we found that adjustability of difficulty was desired by clinical personnel. We decided to abstain from adjustable settings in this study to keep the main focus on the evaluation of the device. vi) Provide meaningful, congruent, and diverse haptic feedback during interactions with virtual objects in the game.

2.2.2 Game design

A screenshot of the designed serious game that satisfies the aforementioned requirements is shown in Figure 4 . The game was developed in Unity3D (Unity Technologies, USA) and represents a cocktail bar with four differently colored liquid dispensers, each one having a glass beneath it. The goal is to fill the glasses by grasping the liquid dispensers with a virtual hand avatar by skillfully squeezing the shell. If the liquid dispensers are grasped too strongly, the liquid starts to spill, which results in lost game life, indicated with a life bar located at the top left of the screen. If the liquid dispensers are not squeezed strongly enough, no or only very little liquid will come out. A timer on the upper right corner of the screen indicates the remaining time. If glasses are overfilled and liquid flows over, the life bar also decreases. For each filled-up glass, the text “Full” appears and a point is added to the user's score, shown on the right of the screen.

Figure 4 . Serious game with four liquid dispensers and hand avatar. The goal is to skillfully squeeze the liquid dispensers to pour liquids into the glasses without spilling any liquid.

The game simulates grasping by mimicking the sensation of physically squeezing a liquid dispenser made of a visco-elastic material. Each liquid dispenser has different characteristics (i.e., stiffness K and damping B in Eq. 5) , reflected in the haptic rendering. When the user squeezes the shell, the virtual hand avatar first moves forward to the liquid dispenser in front of it, and, when further squeezed, the liquid dispenser is grasped. The variable s 0 from Eq. 5 is defined such that the rendering of the virtual wall starts at this point. Importantly, each liquid dispenser possesses different characteristics, i.e., different pairs of stiffness and damping ( K ∈{0.6, 0.1, 0.3, 0.01} N/mm and B ∈{0.005, 0.001, 0.005, 0} Ns/mm), empirically selected and corresponding to the dispensers from left to right. Notably, the displayed behavior of the liquid matches the haptic rendering, e.g., a liquid dispenser with higher impedance (i.e., higher values of K and B ) contains a sticky, viscous liquid, while a lower impedance indicates a runny liquid.

In the first phase, each one of the four glasses needs to be filled once. To move the virtual hand to grasp different dispensers, the fingers need to be extended, and the device tilted—i.e., performing a pronosupination movement. When the IMU detects tilting of more than 5°, the hand avatar moves one step (i.e., one liquid dispenser) in the corresponding tilting direction. After keeping the device tilted for 0.8 s, the avatar continues moving to the next position and so forth, until the device tilting angle is below 5° again. These values were defined through preliminary testing by the developers. To switch position again after a liquid dispenser has been grasped, the hand must be opened again, necessitating active finger extension as specified in the requirements. Once the first four glasses are filled, glasses start to appear randomly. If the life bar is empty, the score is reset to zero and the first phase starts again.

2.3 Usability evaluation

2.3.1 participants.

A total of 13 healthy participants took part in the usability evaluation of our haptic device (six male, six female, and one non-binary; ages between 21 and 64 years; twelve right-handed and one left-handed). Of the 13 participants, three were neurorehabilitation physiotherapists from Rijndam Rehabilitation Center, Rotterdam, the Netherlands. The other ten participants (referred to as non-expert participants in this study) were healthy adults recruited through word of mouth at the Delft University of Technology, Delft, the Netherlands. Following, we refer to participants by pseudonyms T1–T3 (therapists) and N1–N10 (non-expert participants). All participants were naive to the experiment and haptic device. The study was approved by the Human Research Ethics Committee (HREC) of the Delft University of Technology (Application ID: 2216).

2.3.2 Experimental setup and procedure

An unsupervised rehabilitation scenario was reproduced in two different locations. For the non-expert participants, we performed the experiment in a room (approximately 10 m × 5 m) with a table (2.5 m × 1 m) and a height-adjustable chair with backrest. On top of the table, we placed the hand trainer, a laptop, and one emergency stop button. The hand trainer was always placed on the right side beside the laptop, while the emergency stop button was placed on the left side in an easily reachable position. The entire setup was facing a short wall of the room. For the physiotherapists, the experiment was performed at the rehabilitation center in an office (approximately 6 m × 5 m) with a similar setup. A physiotherapist, who was familiar with the device but not involved in its development, led the experiment. One of the device developers was also around to provide support in case of technical difficulties.

The experiment started with obtaining the participants' written consent. Their hand size was then measured and a shell size—i.e., small, medium, large—was suggested by the experimenter according to a predefined size correspondence table. However, participants could switch to a different size after trying the recommended size if desired. The swapping of the shell was performed by the experimenter and is not part of the usability evaluation because in a real-life setting, this would be performed by the therapist and the patient would receive a device where the correct shell is already installed. Five participants felt the most comfortable with the small shell, while the other eight chose the medium size.

After selecting the shell size, participants were equipped with eye-tracking glasses (Tobii Pro Glasses 2, Tobii, Sweden). They were allowed to wear the eye-tracking glasses on top of their prescription glasses. The eye-tracking glasses were calibrated for each user following the manufacturer's guidelines for optimal performance. Participants whose calibration could not be performed successfully due to their prescription glasses were removed from the analysis. The glasses recorded a video of the participant's point of view and a sequence of gaze points (i.e., where they were looking). In addition to the eye-tracking glasses, the experiment was recorded with a video camera, allowing us to measure setup times and identify practical and technical issues after the experiment.

The participants were then invited to sit on the chair and follow the instructions on the laptop screen. They were asked to seek the help of the experimenters only in case of emergency or if they could not continue by themselves. In the case of the non-expert participants, the experimenters moved behind a movable wall equipped with a second emergency stop button but did not leave the room to ensure the participants' safety. At the rehabilitation center, the experimenters positioned themselves diagonally behind the physical therapists to stay out of their line of sight and simulate the minimally supervised scenario while still ensuring the participants' safety with the second emergency stop.

Participants were then asked to follow the instructions presented on the laptop screen through a series of slides related to the device setup, play of the game, and device doffing. The slides related to the device setup instructions included how to turn on the device, how to don the hand, the game instructions, and how to use the emergency button. The device could be turned on by pressing the device button ( Figure 3 ) for at least three seconds. The participants could move to the next instruction slide with a short press of the same device button. After the instructions related to the device setup, a new slide prompted participants to play the game for five minutes. The remaining gaming time was displayed in the upper right corner during the game. When the time was up, a new slide with instructions on turning off the device and releasing the hand appeared. The entire set of instruction slides can be found in the Supplementary material . After the experiment, participants were asked to complete several questionnaires (see Section 2.3.3) and invited to share their experiences in a semi-structured interview. The audio of the interview was recorded for later analysis.

2.3.3 Outcome measures

We defined a variety of quantitative and qualitative outcome measures to assess the usability of the device as well as the participants' motivation and workload. First, the lead experimenter manually recorded the set-up time, i.e., the time required to turn on the device, donning, and doffing. We also noted the number of issues that occurred during the experiment. Hereby we categorized between practical issues (e.g., when the participant visibly misunderstood the instructions or did not know how to proceed) and technical issues (e.g., issues related to the device or the game). In each case, we further noted whether intervention from the experimenters was required to continue the experiment. In cases where the experimenter did not have a clear view of the participant, the recorded video was consulted ad hoc .

We assessed the participants' subjective perception of the system's usability with two questionnaires. We selected the Post-Study Usability Questionnaire (PSSUQ) ( Lewis, 2002 ) for the entire system (i.e., game and device). It consists of 16 seven-point Likert-style items and is divided into three subscales: System Usefulness (i.e., satisfaction, simplicity, and comfort), Information Quality (i.e., if and how relevant information is presented), and Interface Quality (i.e., interaction with the device and game). For an isolated assessment of the device, we additionally employed the shorter System Usability Scale (SUS) questionnaire ( Brooke, 1996 ), which consists of ten five-point Likert-style items. We chose the PSSUQ for the entire system as it exhibits finer granularity and the SUS for the isolated assessment of the device since the PSSUQ contains questions that only make sense in the presence of a software or information component.

The fact that the cognitive capabilities of stroke patients are often affected (e.g., see Mercier et al., 2001 ) motivated us to also investigate the mental load of our participants when using the system. We utilized the raw NASA Task Load Index (RTLX) ( Hart, 2006 ), a widely used questionnaire in usability testing ( Meyer et al., 2021 ). The RTLX assesses six individual domains, namely the mental, physical, and temporal (i.e., perceived time pressure) demand, the perceived performance, effort (i.e., the effort needed to achieve the performance), and the level of frustration. Each domain is assessed through a single 21-point Likert-style item, whereby zero reflects “very low” (or “perfect” in the performance item) and 20 “very high” (or “failure” in the performance item).

Since motivation is known to be a strong driver of effort and participation in robotic training of stroke patients ( Sivan et al., 2014 ), we also included items from the Interest/Enjoyment and the Perceived Competence subscales of the Intrinsic Motivation Inventory (IMI) ( McAuley et al., 1989 ). All questionnaire scores were normalized to a range from 0 to 100 for a more straightforward interpretation of the results. The PSSUQ and IMI subscale scores for each participant were computed by taking the arithmetic average of the corresponding items, and the overall scores of the PSSUQ and SUS were averaged for all items.

We employed the recorded eye-tracking data, i.e., the participants' points of view and accompanying gaze points, to identify the time participants spent looking at different elements of the experimental setup while playing the game. In the context of usability, the proportion of time spent looking at an element ( gaze point rate ) may reflect the importance of that element or could indicate difficulties in understanding an element ( Jacob and Karn, 2003 ). This was achieved by counting the number of gaze points per participant landing on six different rectangular areas of interest (AOIs, Figure 5 ), representing elements of the experimental setup: the device, emergency stop, game (i.e., dispensers and glasses), life bar, score, and remaining time. The number of gaze points landing on the different AOIs was determined per participant from the eye-tracking videos using the AOI tool of the Tobii Pro Lab software (version 1.217, Tobii, Sweden). The AOIs were manually adjusted for keyframes, i.e., individual frames of the videos, at the beginning and end of head movements to ensure that the AOIs were accurately placed on top of their corresponding element. The AOIs' positions and sizes were then linearly interpolated between keyframes. We normalized the number of gaze points per participant and AOI n AOI over the total number of gaze points per participant n total to remove the effect of unequal dataset sizes between participants ( n ^ A O I = n A O I / n t o t a l ). The gaze point rates n ^ A O I were multiplied by the time spent playing the game (300 s) to calculate the total time participants looked at each of the AOIs.

Figure 5 . Exemplary frame from the video recorded by the Tobii glasses for participant N8. The six different rectangular areas of interest (AOIs) are highlighted in different colors.

Finally, we gathered qualitative data through open-ended questions (see Supplementary material ) in semi-structured interviews. These questions served as initial prompts to guide the discussion, though the experimenters were free to ask follow-up questions, allowing them to explore topics that seemed particularly important to the individual participant. The audio recordings of the semi-structured interviews were transcribed locally on a computer with a custom software pipeline written in Python. First, a diarization (i.e., partitioning of the audio into segments according to the speaker) was performed with simple-diarizer ( Simple Diarizer, 2023 ) using the xvec model and spectral clustering. The verbatim transcription was then performed based on faster-whisper ( Faster Whisper, 2023 ), which is a re-implementation of the automatic speech recognition Whisper ( Radford et al., 2023 ). We employed the pretrained medium size model. Afterwards, the transcriptions were manually checked and corrected analogous to the audio recordings. A thematic analysis was then performed to determine the principal themes (i.e., recurring patterns, opinions, and ideas) that emerged from the interviews. This methodology involves a systematic examination of the data, wherein text segments are designated descriptive labels known as codes. These codes with the accompanying text segments are then categorized into cohesive themes, which are subsequently summarized and reported. For a comprehensive description of the procedure, please refer to Braun and Clarke (2008) .

All 13 participants except participant N10 completed all steps of the experiment. The experiment with this participant was ended prematurely by the experimenters when the participant was playing the game due to technical problems with the device (see Table 1 ); however, participant N10 completed the rest of the experiment (i.e., questionnaires and interview) according to the protocol.

Table 1 . Technical and practical issues during setup and game play.

3.1 Setup time, technical issues and practical issues

The setup time measurements are depicted in Figure 6 . The overall median time (first quartile, third quartile) that participants spent with the device setup was 58 (47, 63) s. In particular, turning on the device took 6 (5, 10) s, while the subsequent donning took 41 (33, 53) s. Finally, the doffing was again relatively quick, with a duration of 7 (3, 8) s.

Figure 6 . Box plots of the setup time, subdivided into turning on, donning, and doffing. The whiskers extend to ±1.5 inter-quartile range (IQR) from the nearest hinge.

The encountered technical and practical issues are summarized in Table 1 . Overall, ten practical and four technical issues were observed. With five occurrences, the most observed issue was participants not properly using the magnetic wrist strap. One technical error (N10) led to the ending of the experiment for safety reasons since the technical root cause for this event was unknown at that time.

3.2 Questionnaires

The normalized scores of the questionnaires are summarized in Figure 7 . Because of the ordinal nature of the results from the various questionnaires ( Sullivan and Artino, 2013 ), we represent the central tendency using the median with first and third quartiles.

Figure 7 . Normalized scores from the questionnaires. PSSUQ: SU, System Usefulness; InfQ, Information Quality; IntQ, Interface Quality, SUS: SUS, Total score; RTLX: MD, Mental Demand; PD, Physical Demand; TD, Temporal Demand; P, Performance; E, Effort; F, Frustration; IMI: EI, Enjoyment/Interest; PC, Perceived Competence. The whiskers extend to 1.5 IQR from the nearest hinge.

Regarding the usability questionnaires, the PSSUQ questionnaire, which was applied to the entire system, achieved an overall rating of 70.2 (65.6, 85.6) out of 100. Hereby, the System Usefulness subscale scored the highest with 83.3 (69.4, 83.3), followed by the Information Quality with 73.3 (50.0, 90.0) and Interface Quality with 66.7 (50.0, 83.3). The isolated device usability rating from the SUS achieved a score of 77.5 (72.5, 82.5). SUS values of 50.9–71.4, 71.4–85.5, and 85.5–90.9 correspond to OK–good, good–excellent, and excellent–best-imaginable usability, respectively according to Bangor et al. (2009) . Note that previous studies have shown that the PSSUQ and the SUS questionnaires are highly correlated ( Vlachogianni and Tselios, 2023 ).

The assessment with the RTLX showed a mental demand of 25.0 (15.0, 45.0), a physical demand of 25.0 (15.0, 55.0), and a temporal demand of 20.0 (5.0, 45.0). Furthermore, it revealed that participants rated their performance with 70.0 (55.0, 80.0), which they achieved with a perceived effort of 45.0 (30.0, 55.0). Hereby, they rated their frustration level as 20.0 (15.0, 30.0) out of 100. In general, low values of the RTLX items indicate a low workload, except for the performance item, where a high value indicates good perceived performance.

Finally, regarding motivation, the overall IMI Interest/Enjoyment subscale score reached 64.3 (57.1, 76.2) out of 100 and the Perceived Competence subscale reached a score of 63.9 (58.3, 69.4). High scores in the IMI subscales relate to high enjoyment and high perceived competence, respectively.

3.3 Gaze point rates per AOI

The results of the gaze point rate per AOI are shown in Figure 8 . Two of the eye-tracking datasets were removed due to failed calibration procedures (N1, T2), one caused by technical issues with the data (faulty battery, N7), and one because of the premature termination of playing the game (N10), leaving nine out of 13 datasets. The screen area with the cocktail glasses and dispensers (i.e., the game AOI) obtained the highest normalized hit rate with 87.0% (4.3%) (average and standard deviation). Notably, participants T1 and N5 spent a considerable amount of time looking at the life bar (12.2 s and 7.1 s), while participants N3 and N8 spent more time looking at the device when compared to their peers (5.0 s and 3.5 s, respectively). Overall, the hit rates of 0.42% (0.57%) on the device and 0.03% (0.07%) on the emergency stop with resulting average duration of only 1.27 s and 0.097 s, respectively, were low in comparison with other AOIs.

Figure 8 . Gaze point rates per AOI for each participant with eye-tracking (nine out of the 13).

3.4 Semi-structured interviews

The thematic analysis led to the classification of 495 quotations, resulting in the assignment of 86 codes, which we then organized into seven groups: General Impressions, Pronosupination Movements, Instructions, Game, Comfort, Grasping with Haptic Rendering , and Application & Clinical Use . Following, we present the main findings for each group with examples of supporting participant statements.

3.4.1 General impressions

The participants liked the sleek and simplistic design of the device. The majority of the participants appreciated having only one button for all functions, as it simplified the user experience and reduced the need to remember multiple buttons. One participant expressed concerns about accidentally turning the device off.

“ It's quite portable, it's looking sleek, it has nice curves” (T1)

“ I think it's very simple so that's great.” (N6)

“ I like that there's only one button, because it's just easy” (N4)

The weight and size of the device was generally considered acceptable. Some suggested making it slightly lighter, while others thought it provided stability.

“ I think it's nice that it's heavy when you have to move it, because then you really feel that it's rolling through.” (N4)

3.4.2 Pronosupination movements

Seven participants mentioned that the device tilting action to move between dispensers felt clunky and less responsive than expected. They struggled with the step-by-step movement of the hand avatar when tilting and were unsure if the hand needed to stay tilted to move multiple dispenser positions.

“ And the turning to the left and right was very... It was taking steps. I thought it was more fluid, but it was taking steps.” (T2)

Furthermore, some participants stated that the tilting felt counter-intuitive at first as the design itself did not look as it was supposed to be tilted.

“ It didn't feel very intuitive when I was moving it left and right. Because I would imagine if it's a device that's supposed to rotate it would have something at the bottom that's not flat.” (N6)

3.4.3 Instructions

The reported feelings about the setup and game instructions were mixed. While some participants complimented the simplicity, seven participants mentioned that the instructions were not clear enough and raised concerns about the cognitive load, especially for users with potential cognitive impairments. They recommended simplifying the instructions, making them less information-dense. Furthermore, it was repeatedly suggested that step-by-step video demonstrations or looping animations might be more informative and easier to follow.

“ Very clear. And concise. Yeah no it was clear.” (N5)

“ I think it's more understandable if I see a 5-minute video and see this is the procedure, then there is no need to read something.” (N8)

In particular, for the magnetic wrist strap, the participants wished to obtain more detailed information about the exact opening mechanism. Several participants were initially confused about the magnetic mechanism of the wrist strap. Some did not realize that it could be opened and instead released the adjacent hook and loop. It was also mentioned that the color coding of the parts could be improved (e.g., finger strap), and should be chosen more carefully to represent their respective importance during the setup. For example, the wrist strap locking mechanism should be visually more highlighted than the finger strap adjustment as it is required to be opened every time during setup, while the finger strap only needs to be adjusted occasionally.

“ The only problem I had was with the wrist strap. It says open the lock which I interpreted as just open the hook and loop.” (N8)

“ Yes, but there's a red strap here so at first I was just like this because I read quickly and I didn't really understand [...] maybe this [finger strap red part] shouldn't be highlighted more than this [wrist fixation].” (N5)

Participant T2 mentioned having read only a little bit of the instructions, and Participant N8 admitted clicking through the instructions, without following them.

The game was generally perceived as fun and enjoyable to play for the given time. Although some participants struggled to some extent with the pronosupination movements to move the virtual hand sideways, the game appeared to be intuitive for most participants.

“ The game, yes, it was funny. I wouldn't play it for hours, of course, but I think it's intuitive and fun.” (N5)

Five participants reported that the concept of the life bar was not fully understood or that the life bar was not even noticed for the majority of the time. It was suggested to make the life bar visually more dominant or to explain it better during the instructions.

“ The position of the bar needs to be closer to what's happening. Or there needs to be some visual connection.” (N6)

Yet, few participants noted that the game was boring or could quickly become boring. In this context, some participants expressed their disappointment that the score was not saved and that there was no high score they could beat. It was suggested that a more competitive setting—even if it is just beating one's own score—would increase their motivation and interest in the game. Furthermore, more levels with increased difficulty would help to maintain motivation during longer sessions. The timer was mostly appreciated as a motivational element, although one participant perceived it as stressful.

“ It also wasn't really clear to me what my previous score was, so what score should I beat? Because it was a fun game to play, I would like to be competitive.” (N1)

“ Just shortly doing it is okay but playing it longer will be very boring for me.” (T2)

3.4.5 Comfort

Participants found the device generally comfortable and safe. Nevertheless, concerns about the wrist position and angle during prolonged use were raised, especially for persons with a paretic upper limb. Due to the height of the device, the hand was in an elevated position with respect to the elbow, resulting in a slight ulnar abduction.

“ It is quite comfortable. I was like in a relaxing pose. It was not stressing my hand, it is also very smooth and it is not too tight.” (N2)

“ The position of my wrist was a little bit uncomfortable, I think because it was elevated from the table.” (N8)

Two participants found the finger strap adjustment slightly finicky due to the limited space to attach the hook and loop on the shell. One participant desired to have the finger strap in a more proximal position. One participant pointed out that the thumb's position was somewhat unclear, and three suggested that a thumb strap might be helpful during extension movements.

3.4.6 Grasping with haptic rendering

Participants generally found the grasping motion easy to perform. Most of them appreciated the realistic grasping sensation and how the haptic feedback correlated with their actions.

“ At the beginning I was looking at the device to see where my fingers were, but at some point, I was just not looking anymore because of the haptic feedback. It was nice.” (N5)

“ Really cool, how the grasping really works nicely with the feedback, it really felt like I had some nice feedback, yeah it worked well” (N1)

However, a few also reported that the visuals played a predominant role in their interaction and expressed the need for more prominent and informative haptic feedback.

“ I don't know how much I would have been able to tell the difference without the visual aid because I don't know if like my brain was so sensitive to what's happening with my hand. I think those visuals were super important.” (N6)

“ I did not feel that a lot. I saw a lot with the drops, but I did not feel very different things.” (T3)

3.4.7 Application and clinical use

All participants stated that they would feel comfortable using the device themselves in an unsupervised environment in the hypothetical scenario of undergoing upper-limb rehabilitation. Two out of the three participating therapists noted that they would use it with their patients, while one was not sure yet. The therapists saw potential applications either in early rehabilitation, group therapy, or home rehabilitation—in particular for patients with reduced tactile or proprioceptive sensibility.

“ I think when they have sensibility problems it's very difficult to give the right force to hold a glass or something. So people do that or it's too loose and it falls. So I think with this device you can maybe learn a little bit more and normally we do that with grabbing things. So I think it can be useful for that kind of problems.” (T3)

One therapist noted that stroke patients might benefit from adjustable assistance during the exercise. One mentioned that an initial assessment of patients' range of motion and available grasping force could be used to adjust the device and the game. Moreover, therapists highlighted the importance of variation during the rehabilitation training and suggested increasing the number of available exercises/games.

4 Discussion

4.1 we evolved our concept into a safe, aesthetic, and functional prototype.

We developed a minimally-actuated device to meet the need for cost-effective haptic upper-limb training devices for minimally supervised or unsupervised neurorehabilitation. We realized a device that is inherently safe, suitable for a variety of hand sizes, and that can provide meaningful haptic feedback during the grasping of virtual objects by combining a compliant shell design with highly back-drivable actuation. We refined the device's appearance, and also added a passive DoF for wrist pronosupination, a movement highly recommended by therapists ( Rätz et al., 2021b ), by allowing the entire device to be tilted around its longitudinal axis. The combination of passive and active degrees of freedom is in line with the recommendations of Forbrigger et al. (2023a) , who suggested this concept to reduce cost while still providing high functionality. We thus satisfied all the required device improvements that we defined based on the first concept (see Section 2.1.1).

Our novel hand trainer is complemented by a serious game that challenges users to fill virtual cocktail glasses using simulated liquid dispensers with different haptic behaviors, highlighting the haptic capabilities of our device. Thereby, the difficulty of successfully filling the glass without spilling any liquid depends on the simulated liquid and varies across the different dispensers. The task mimics a scenario akin to ADL, as it requires precise grasping, force dosing and timing to succeed. Moreover, the game promotes finger extension, as users must open their hand before switching between liquid dispensers using pronosupination movements.

When compared to the state of the art—represented by similar devices like the PoRi ( Wolf et al., 2022 ) or the ReHandyBot (Articares Pte Ltd, Singapore)—our innovation exhibits a distinct advantageous combination of portability, intrinsic safety, and setup simplicity. Functional differences are that the PoRi is more lightweight and can be freely moved in space by patients with advanced proximal upper-limb functions, while our device sits stably on a surface, making it also accessible for more impaired patients. The ReHandyBot, already available on the market, offers actuated pronosupination, although at the cost of increased complexity. While other studies consider devices of more than 50 kg still portable (e.g., Sivan et al., 2014 ), we agree with Lu et al. (2011) that a portable device should be compact and lightweight enough to be easily transported to patients' homes—preferably by patients themselves—and low-cost. The affordability of our device is enabled by the combination of one active with one passive DoF and a readily available low-cost microcontroller and IMU. Moreover, most parts could be manufactured from technical plastics as we demonstrated by the mostly 3D-printed prototype. Currently, the main cost-driving elements are the high-end electric motor and motor driver, as they make up for more than 50% of the device's price.

To evaluate our design, we performed a usability study in a simulated unsupervised environment with 13 healthy participants, of whom three were physiotherapists from Rijndam Rehabilitation, Rotterdam, the Netherlands. This experience allowed us to gain valuable insights and information to note what needs to be dropped, added, kept, and improved in the following design iteration.

4.2 Lessons learned from the usability evaluation

4.2.1 our device requires less than one minute to set up.

The overall median set-up time—including turning on, donning, and doffing—remained below one minute. This is five times lower than the maximum setup time of robotic devices that therapists are willing to spend in inpatient rehabilitation ( Rätz et al., 2021b ). While the requirements in terms of setup time for home rehabilitation remain to be investigated, if we assume that they are of similar magnitude as those in a clinical setting, we feel confident that our device setup time is acceptable for home rehabilitation users. The very short doffing times observed once the participants understood how the straps work, already indicate that it is likely that our device could be donned and doffed even faster with more experience. Yet, it remains to be evaluated how stroke survivors—especially those suffering from spasticity and not being able to extend their fingers—will be able to accomplish the device setup.

4.2.2 Overall, our haptic device is perceived as highly usable and intuitive

The entire system, i.e., taking into account the device and game, achieved an overall median PSSUQ rating of 70.2 out of 100, indicating good usability, while the isolated device usability rating from the SUS achieved a score of 77.5, considered to correspond to good—excellent usability based on the ranges defined in Bangor et al. (2009) . These values are in line with those from other studies of devices for similar applications. For example, the HandyBot was attributed a SUS score of 76.3 and 85.0 for the device itself and the GUI respectively ( Ranzani et al., 2023 ). The user interface of the ReHapticKnob was rated 85.0 and two accompanying haptic games with 76.3 and 68.8 ( Ranzani et al., 2021 ). The MERLIN device scored 71.9 in a home rehabilitation feasibility study ( Guillén-Climent et al., 2021 ). Lastly, a SUS score of 77.5 was reported for the GripAble device in a usability study with Parkinson's disease patients ( Saric et al., 2022 ).

The semi-structured interviews allowed us to gain a deep insight into participants' opinions. In general, the device was considered user-friendly and participants highlighted that the device looked sleek, portable and simple, thereby endorsing the overall concept. Interestingly, participants almost did not mention the shell during the interviews, suggesting that the interaction appeared to be natural and intuitive. This is supported by the results from the eye-tracking data, which show that participants did not look much at the device itself while playing the game. It seems that it was not necessary to often look at the device after donning it, indicating a generally intuitive and seamless human-device interaction. Importantly, the gaze rate on the emergency stop was marginal, possibly reflecting that participants felt safe during playing or indicating a high level of involvement in the game.

The results from the RTLX questionnaire, which reflect the participant's perceived workload during the experiment, seem to endorse the idea that the system was perceived as intuitive. With a median score of 20 and no data point higher than 30, the frustration level of the participants appears acceptable given that they used the device the first time. Furthermore, the median score of the mental, physical, and time demands were lower than 25, although with a larger dispersion. Yet, while lower values of the RTLX are preferable (except the inverted Performance item) for rehabilitation device interfaces ( Ranzani et al., 2021 ), it can not be generally stated that mental, physical, and temporal demand, as well as effort, should be as low as possible for games or exercises. On the contrary, for example, to achieve a high exercise intensity, a larger (perceived) effort is typically desirable ( Eston et al., 1987 ; Church et al., 2021 ). To promote neuroplasticity—which is the ultimate goal of this device—the performance should be high enough to keep the user motivated, but low enough to provide room for improvement ( Guadagnoli and Lee, 2004 ). The perceived median performance score of 70 in combination with the perceived effort score of 45 indicates that the difficulty might have been appropriate for the skill level of the healthy participants. This is supported by the perceived competence subscale of the IMI, which is in line with the RTLX perceived performance item.

4.2.3 We should invest in game personalization

While some participants reported that they loved the game, some found it very boring. This seems to be reflected in the resulting score of the Enjoyment/Interest subscale of the IMI (64.3 over 100), which indicates a good but not high median intrinsic motivation of the participants during the experiment ( Reynolds, 2007 ). As a comparison, the MERLIN device scored 85.7 in a home rehabilitation setting over a duration of a few weeks ( Guillén-Climent et al., 2021 ). We presume that our study's lower score might be explained by the varying interests of the participants in the game, potentially influencing their intrinsic motivation. The diversity of participants' feelings and opinions not only highlights the need to improve our game further but also shows that multiple, different games would be a necessity for an at-home study with patients. A collection of interesting and diverse games is a prerequisite for successful home rehabilitation. Indeed, it has been observed that the usage times of robotic devices at home are low when the patients reported a lack of complexity and enjoyment in the games ( Sivan et al., 2014 ). In particular for rehabilitation with stroke patients, it will be important to provide difficulty levels that are tailored to each patient's abilities ( Colombo et al., 2007 ).

Moreover, multiple participants pointed out that a more elaborate scoring system (e.g., personal high score) could increase their motivation. Both the interviews and eye-tracking showed varying utilization and understanding of the life bar (that reflects performance), for which we see two reasons: i) Although the life bar was indicated in the instruction slides, we did not explicitly mention how it works. The time of five minutes might have been too short for some participants to implicitly learn the relation between life bar and spilling. ii) The interviews revealed that some participants did not notice the life bar.

4.2.4 There is room for improvement in the wrist fixation and the passive pronosupination degree of freedom

Twelve out of the thirteen participants were able to perform the setup of the device and play the game with no or minimal intervention. Yet, we identified a few practical and technical issues. With regard to practical issues, i.e., those related to misconception or incorrect manipulation, we found that five of the eleven occurrences stemmed from participants having difficulties with the magnetic wrist lock. While this specific practical issue did not require the intervention of the experimenters, it points to a usability issue. This was not expected, as this part was indeed designed to facilitate the setup. This issue is supported by the comments gathered from the semi-structured interviews that pointed out that the instructions regarding the wrist fixation might have not been clear enough.

The pronosupination passive DoF also gathered the attention of participants. We did not only find that it caused one of the practical issues, but multiple participants reported that the movements were not straightforward. One reason could be that the rounding of the bottom edges of the device is uniform along its length—i.e., cylindrical with the center flat part. This is in contrast to literature that describes pronosupination movements as rolling movements of a cone, with its center being the elbow ( Kapandji, 1982 ). Thus, the rolling of the device might not correspond to natural, physiological pronosupination. Another reason could be that the flat bottom that we designed for stability seemed actually to have discouraged users from tilting the device. The pronosupination issue might have been further aggravated by the wrist position, which could become uncomfortable for prolonged use, according to the interviews. Indeed, the wrist is in a slight ulnar abducted position due to the elevated hand position with respect to the elbow.

4.2.5 The haptic rendering is generally well perceived

Participants generally appreciated the realistic haptic sensation and how the haptic feedback correlated with their grasping actions. Yet, a few participants mentioned that they did not consciously notice or use the haptic feedback. For this, we suggest four possible explanations: i) The haptic forces were not strong enough. ii) The haptic feedback worked well and was very coherent with the game. Therefore, participants did not actually notice that the haptic rendering forces were generated artificially. iii) Participants might have confounded the expected inherent springiness of the shell with the haptic rendering. iv) Participants subconsciously noticed the haptic feedback but did not perceive it as informative as they might have relied on the visual feedback as for example suggested by the answers of N1 and N6.

Although it is likely that some participants have mistaken the haptic rendering for the inherent compliance of the shell, points (i), (ii) and (iv) would require further investigation to be confirmed or disproved, for example in a within-subject study with haptic and non-haptic conditions. We can, however, comment on point (i): It is indeed possible that stiffness and damping values might have been chosen too low. A stiffness of 2 N/mm is required for an object to be perceived as stiff ( Massie and Salisbury, 1994 ), while our stiffest object was only 0.6 N/mm. The chosen values were thought to well represent the deformable dispensers. However, it might have been beneficial to choose higher values or at least to accentuate the impact when touching a dispenser (e.g., with more distinct values of the K and B gains or vibratory cues).

4.2.6 Instructions are of critical importance for devices in minimally supervised environments

The other practical issues only occurred once and included instances where participants either did not adhere to the provided instructions or manipulated the device too early/late. We also noted confusion related to the game, in particular to the life bar and the pronosupination movements. This could indicate that parts of the instructions might not have been clear. This is supported by several statements from the semi-structured interviews. Indeed, we believe that unclear instructions were the main reason behind some of the low scores in the Information Quality subscale of the PSSUQ results. The score dispersion of the Information Quality is the highest among the PSSUQ subscales, showing that participants' perceptions of this aspect were very diverse, i.e., some were completely satisfied with the provided information, while others desired improvements in the provided information.

This brings us to an important learning for device development for unsupervised settings: The instructions are equally important as the device and the exercise themselves. In hindsight, we must acknowledge that we focused on the device and game during the development. This calls for the need to include other stakeholders in all design phases, such as cognitive psychologists.

4.2.7 The device could benefit from improvements to make it more robust

Although technical issues may be unfortunate at first sight—for example, for participant N10, who was not able to play the games for the full five minutes—they are an inherent aspect of early testing and a valuable opportunity for improving the device. The particular incident with N10 was most likely caused by slippage of the large gear pulley (pulley with diameter d 2 in Figure 1 ) on its axle due to insufficient clamping. This caused a misalignment of the motor encoder. The other technical issues necessitate further reliability testing of the software and the implementation of online error-checking routines and appropriate measures. For example, a failed calibration can easily be detected by driving the shell along its entire range of motion and comparing the resulting distance with the expected distance.

4.3 Study limitations

Our study has some limitations and shortcomings. First, we did not include stroke patients in this first usability study. While the inclusion of stroke patients in usability evaluations is undeniably important, the involvement of non-expert participants and therapists can also contribute indispensable insights in the early stages of device development. Following the double diamond design process model, after the first phases of discover and define , the iterative phases of design and deliver start, where new designs are created and evaluated by end users ( Design Council, 2005 ). Ideally, the patients should be included in all these phases. Yet, the bureaucratic work required to involve patients in testing is long and tedious, requiring approvals from the local ethics committees every time a modification/improvement is made, which slows down the design process. Therefore, intermediate evaluation steps with healthy non-expert participants and therapists serving as proxies allow already assessing basic functionality, general user experience, and initial usability challenges that might not be exclusive to stroke patients. This helps to detect usability problems early, thus allowing faster convergence to more appropriate solutions, saving time, and reducing the burden on patients.

Second, an inherent drawback of our experimental design is that the usability of the device itself might have been confounded with the quality of the instructions. It has been shown that there is a positive significant correlation between the quality of user instructions and perceived product quality ( Gök et al., 2019 ). Therefore, unclear instructions might have aggravated the perception of usability issues. However, in the case of this study, our set of rather minimalist instructions might actually have helped to extract the maximum amount of information from the experiment.

Third, the findings of our study could be limited by the participants' awareness of the experimenters' presence as the unsupervised scenario was only simulated. While this setup allowed intervention for practical or technical issues, it could have affected the participants' behavior when compared to a fully unsupervised setting.

4.4 Next steps in our human-centered design approach

In this first usability study, we gathered valuable information, recommendations, and points for improvements to be exploited in the next design iteration. In short, we plan to work on: i) Adapt the bottom of the device and the wrist fixation to facilitate the pronosupination movements while guaranteeing physiological positioning of the wrist. ii) Develop more games with different difficulty levels, and include an improved scoring system (e.g., personal high score). iii) Accentuate the haptic rendering to provide better noticeable variations between different game objects. This might include the implementation of more advanced techniques to further promote sensorimotor learning, such as haptic error modulation ( Marchal-Crespo et al., 2019 ; Basalp et al., 2021 ). iv) Change the modality of instructions: Instead of slides, we will explore the use of video instructions. Moreover, we might perform checks to see if the user performed the correct action before continuing to the subsequent one. v) Further increase the portability of our system by removing the emergency stop buttons, potentially replacing the external power supply with a battery, and switching to wireless communication. This step will also necessitate making the device more robust and reliable. vi) Integrate an absolute encoder and automatic detection of the installed shell size to avoid the currently necessary calibration sequence. vii) Implement an assessment routine that allows to determine the user's range of motion and grasping force. viii) Further lower the cost of the device, for example by replacing the motor and motor driver with a lower-cost solution or the redesign of complicated components. On this note, the general robustness might also be further improved in prospect of potential future large-scale studies.

Gathering patient feedback—potentially also in a longitudinal study—will be our main focus after realizing the aforementioned improvements. The combination of group therapy with home rehabilitation (where patients use the exact same device) has been suggested as a promising way of efficiently increasing therapy dosage ( McCabe et al., 2019 ) and could present a suitable use case for the next round of usability testing.

With respect to the possible commercialization of the device, the distribution and support will become key factors that need to be considered. Moreover, we will investigate various financial models to ensure the economic viability of such a relatively low-cost device once it is ready for commercialization. It has been shown that innovations with potentially high societal impact but lower economic value—e.g., medical low-cost devices such as the one presented in this study—are notoriously difficult to obtain investments ( Allers et al., 2023 ). Thus, we must ensure that our device is not only low-cost but, first of all, cost-efficient, i.e., it must not only hold its therapeutic premise but also provide an economic benefit to the health care system and investors.

4.5 Conclusion

We presented the second iteration of a novel minimally-actuated haptic hand trainer for minimally supervised and unsupervised rehabilitation of patients with acquired brain injury, as well as an accompanying serious game. The introduction of a novel compliant shell mechanism allowed us to design a device that is simple and provides intuitive and intrinsically safe physical human-device interaction.

Following a human-centered iterative development approach, we performed a thorough analysis of the prototype's usability with therapists and healthy non-expert users. In a simulated unsupervised scenario, we asked the participants to set up the device and play a game based on a set of written instructions. Our mixed-method approach allowed us to gain insights into usability issues of our prototype. While the testing showed good overall usability of the device and the game, we identified various areas of improvement, such as the wrist fixation, the pronosupination movements, and instructions.

Our prototype shows promise for use in both minimally supervised therapy and unsupervised home rehabilitation. We are looking forward to further improving our device to deploy it with neurological patients and contribute to the democratization of robotic rehabilitation in order to improve the quality of life of especially vulnerable patients.

Data availability statement

The original contributions presented in the study are included in the article/ Supplementary material , further inquiries can be directed to the corresponding author.

Ethics statement

The studies involving humans were approved by Human Research Ethics Committee (HREC) of the Delft University of Technology (Application ID: 2216). The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

RR: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Visualization, Writing – original draft, Writing – review & editing. AR: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Supervision, Visualization, Writing – original draft, Writing – review & editing. NC-G: Conceptualization, Data curation, Formal analysis, Investigation, Writing – review & editing. GR: Conceptualization, Methodology, Project administration, Resources, Supervision, Validation, Writing – review & editing. LM-C: Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Validation, Writing – original draft, Writing – review & editing.

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by the Swiss National Science Foundation through the Grant PP00P2163800, the Dutch Research Council (NWO, VIDI Grant Nr. 18934), and the Convergence Flagship Human Mobility Center.


The authors would like to acknowledge the highly valued contribution of Jonas Kober during the mechatronic development of the presented device. We are also grateful for the help of Alberto Garzás Villar with the game development. Furthermore, we would like to thank the therapists from the Department of Neurology, University Hospital Bern, Switzerland for their feedback during the development of the game and the device and the therapists from the Rijndam Rehabilitation Center, Rotterdam, the Netherlands, for participating in the usability experiment. Finally, the authors highly appreciate the efforts of Katie Poggensee in proofreading the manuscript.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling editor JF-L declared a past co-authorship with the author LM-C.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at:

Akbari, A., Haghverd, F., and Behbahani, S. (2021). Robotic home-based rehabilitation systems design: from a literature review to a conceptual framework for community-based remote therapy during COVID-19 pandemic. Front. Robot. AI 8, 1–34. doi: 10.3389/frobt.2021.612331

PubMed Abstract | Crossref Full Text | Google Scholar

Allers, S., Eijkenaar, F., van Raaij, E. M., and Schut, F. T. (2023). The long and winding road towards payment for healthcare innovation with high societal value but limited commercial value: A comparative case study of devices and health information technologies. Technol. Soc . 75, 102405. doi: 10.1016/j.techsoc.2023.102405

Crossref Full Text | Google Scholar

Bangor, A., Kortum, P., and Miller, J. (2009). Determining what individual SUS scores mean: Adding an adjective rating scale. J. Usabil. Stud . 4, 114–123. doi: 10.5555/2835587.2835589

Basalp, E., Wolf, P., and Marchal-Crespo, L. (2021). Haptic training: Which types facilitate (re)learning of which motor task and for whom Answers by a review. IEEE Trans. Haptics 1412, 3104518. doi: 10.1109/TOH.2021.3104518

Biddiss, E. A., and Chau, T. T. (2007). Upper limb prosthesis use and abandonment. Prosthetics & Orthot. Int . 31, 236–257. doi: 10.1080/03093640600994581

Bolognini, N., Russo, C., and Edwards, D. J. (2016). The sensory side of post-stroke motor rehabilitation. Restor. Neurol. Neurosci . 34, 571–586. doi: 10.3233/RNN-150606

Braun, V., and Clarke, V. (2008). Using thematic analysis in psychology, qualitative research in psychology. J. Chem. Inf. Model . 3, 77–101. doi: 10.1191/1478088706qp063oa

Brooke, J. (1996). SUS : “A ‘quick and dirty' usability scale,” in Usability Evaluation in Industry, Number (Boca Raton: CRC Press), 207–212.

Google Scholar

Bullock, I. M., Zheng, J. Z., Rosa, S. D. L., Guertler, C., and Dollar, A. M. (2013). Grasp frequency and usage in daily household and machine shop tasks. IEEE Trans. Haptics 6, 296–308. doi: 10.1109/TOH.2013.6

Burke, J. W., McNeill, M. D., Charles, D. K., Morrow, P. J., Crosbie, J. H., and McDonough, S. M. (2009). Optimising engagement for stroke rehabilitation using serious games. Visual Comp . 25, 1085–1099. doi: 10.1007/s00371-009-0387-4

Camillieri, S. (2019). A paradigm shift for acute rehabilitation of stroke. J. Stroke Med . 2, 17–22. doi: 10.1177/2516608519848948

Chen, Y., Abel, K. T., Janecek, J. T., Chen, Y., Zheng, K., and Cramer, S. C. (2019). Home-based technologies for stroke rehabilitation: a systematic review. Int. J. Med. Informat . 123, 11–22. doi: 10.1016/j.ijmedinf.2018.12.001

Chi, N. F., Huang, Y. C., Chiu, H. Y., Chang, H. J., and Huang, H. C. (2020). Systematic review and meta-analysis of home-based rehabilitation on improving physical function among home-dwelling patients with a stroke. Arch. Phys. Med. Rehabil . 101, 359–373. doi: 10.1016/j.apmr.2019.10.181

Church, G., Smith, C., Ali, A., and Sage, K. (2021). What is intensity and how can it benefit exercise intervention in people with stroke? a rapid review. Front. Rehabilitat. Sci . 2, 722668. doi: 10.3389/fresc.2021.722668

Ciortea, V. M., Motoac, I., Ungur, R. A., Borda, I. M., Ciubean, A. D., and Irsay, L. (2021). Telerehabilitation a viable option for the recovery of post-stroke patients. Applied Sciences 11:10116. doi: 10.3390/app112110116

Colombo, R., Pisano, F., Mazzone, A., Delconte, C., Micera, S., Carrozza, M. C., et al. (2007). Design strategies to improve patient motivation during robot-aided rehabilitation. J. Neuroeng. Rehabil . 4, 3. doi: 10.1186/1743-0003-4-3

Cramer, S. C., Dodakian, L., Le, V., See, J., Augsburger, R., McKenzie, A., et al. (2019). Efficacy of home-based telerehabilitation vs in-clinic therapy for adults after stroke: a randomized clinical trial. JAMA Neurol . 76, 1079–1087. doi: 10.1001/jamaneurol.2019.1604

Design Council (2005). A Study of the Design Process “ The Double Diamond ”. Available online at:

Eston, R. G., Davies, B. L., and Williams, J. G. (1987). Use of perceived effort ratings to control exercise intensity in young healthy adults. Eur. J. Appl. Physiol. Occup. Physiol . 56, 222–224. doi: 10.1007/BF00640648

Faster Whisper (2023). faster-whisper . Available online at: (accessed November 11, 2023).

Feigin, V. L., Brainin, M., Norrving, B., Martins, S., Sacco, R. L., Hacke, W., et al. (2022). World Stroke Organization (WSO): global stroke fact sheet 2022. Int. J. Stroke 17, 18–29. doi: 10.1177/17474930211065917

Fong, J., Crocher, V., Tan, Y., Oetomo, D., and Mareels, I. (2017). “EMU: A transparent 3D robotic manipulandum for upper-limb rehabilitation,” in 2017 International Conference on Rehabilitation Robotics (ICORR) (London: IEEE), 771–776.

PubMed Abstract | Google Scholar

Forbrigger, S., DePaul, V. G., Davies, T. C., Morin, E., and Hashtrudi-Zaad, K. (2023a). Home-based upper limb stroke rehabilitation mechatronics: challenges and opportunities. Biomed. Eng. Online 22, 67. doi: 10.1186/s12938-023-01133-8

Forbrigger, S., Liblong, M., Davies, T., DePaul, V., Morin, E., and Hashtrudi-Zaad, K. (2023b). Considerations for at-home upper-limb rehabilitation technology following stroke: Perspectives of stroke survivors and therapists. J. Rehabilitat. Assist. Technol. Eng . 10, 205566832311718. doi: 10.1177/20556683231171840

Garrett, J. W. (1971). The adult human hand: some anthropometric and biomechanical considerations. Human Factors . 13, 117–131. doi: 10.1177/001872087101300204

Gassert, R., and Dietz, V. (2018). Rehabilitation robots for the treatment of sensorimotor deficits: a neurophysiological perspective. J. Neuroeng. Rehabil . 15, 46. doi: 10.1186/s12984-018-0383-x

Gök, O., Ersoy, P., and Börühan, G. (2019). The effect of user manual quality on customer satisfaction: the mediating effect of perceived product quality. J. Product & Brand Manageme . 28, 475–488. doi: 10.1108/JPBM-10-2018-2054

Goldberg, J. H., and Wichansky, A. M. (2003). “Eye tracking in usability evaluation,” in The Mind's Eye (London: Elsevier), 493–516.

Guadagnoli, M. A., and Lee, T. D. (2004). Challenge point: a framework for conceptualizing the effects of various practice conditions in motor learning. J. Mot. Behav . 36, 212–224. doi: 10.3200/JMBR.36.2.212-224

Guillén-Climent, S., Garzo, A., Mu noz-Alcaraz, M. N., Casado-Adam, P., Arcas-Ruiz-Ruano, J., Mejías-Ruiz, M., et al. (2021). A usability study in patients with stroke using MERLIN, a robotic system based on serious games for upper limb rehabilitation in the home setting. J. Neuroeng. Rehabil . 18, 1–16. doi: 10.1186/s12984-021-00837-z

Haakenstad, A., Irvine, C. M. S., Knight, M., Bintz, C., Aravkin, A. Y., Zheng, P., et al. (2022). Measuring the availability of human resources for health and its relationship to universal health coverage for 204 countries and territories from 1990 to 2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet 399, 2129–2154. doi: 10.1016/S0140-6736(22)00532-3

Hackett, M. L., Vandal, A. C., Anderson, C. S., and Rubenach, S. E. (2002). Long-term outcome in stroke patients and caregivers following accelerated hospital discharge and home-based rehabilitation. Stroke 33, 643–645. doi: 10.1161/str.33.2.643

Handelzalts, S., Ballardini, G., Avraham, C., Pagano, M., Casadio, M., and Nisky, I. (2021). Integrating tactile feedback technologies into home-based telerehabilitation: opportunities and challenges in light of COVID-19 pandemic. Front. Neurorobot . 15, 617636. doi: 10.3389/fnbot.2021.617636

Hart, S. G. (2006). NASA-task load index (NASA-TLX); 20 years later. Proc. Human Factors Ergon. Soc . 2006, 904–908. doi: 10.1177/154193120605000909

Hesse, S., Heß, A., Werner, C.C, Kabbert, N., and Buschfort, R. (2014). Effect on arm function and cost of robot-assisted group therapy in subacute patients with stroke and a moderately to severely affected arm: a randomized controlled trial. Clin. Rehabil . 28, 637–647. doi: 10.1177/0269215513516967

Hyakutake, K., Morishita, T., Saita, K., Fukuda, H., Shiota, E., Higaki, Y., et al. (2019). Effects of home-based robotic therapy involving the single-joint hybrid assistive limb robotic suit in the chronic phase of stroke: a pilot study. Biomed Res. Int . 2019, 5462694. doi: 10.1155/2019/5462694

Jacob, R. J., and Karn, K. S. (2003). “Eye tracking in human-computer interaction and usability research,” in The Mind's Eye (London: Elsevier).

Kapandji, I. A. (1982). Physiology of the Joints, Volume 1, Upper Limb 5th edition .

Kwakkel, G., Kollen, B. J., Van der Grond, J. V., and Prevo, A. J. (2003). Probability of regaining dexterity in the flaccid upper limb: impact of severity of paresis and time since onset in acute stroke. Stroke 34, 2181–2186. doi: 10.1161/01.STR.0000087172.16305.CD

Kwakkel, G., van Peppen, R., Wagenaar, R. C., Wood Dauphinee, S., Richards, C., Ashburn, A., et al. (2004). Effects of augmented exercise therapy time after stroke. Stroke 35, 2529–2539. doi: 10.1161/01.STR.0000143153.76460.7d

Lai, S.-M., Studenski, S., Duncan, P. W., and Perera, S. (2002). Persisting consequences of stroke measured by the stroke impact scale. Stroke 33, 1840–1844. doi: 10.1161/01.STR.0000019289.15440.F2

Lambercy, O., Lehner, R., Chua, K., Wee, S. K., Rajeswaran, D. K., Kuah, C. W. K., et al. (2021). Neurorehabilitation from a distance: can intelligent technology support decentralized access to quality therapy? Front. Robot. AI 8, 1–9. doi: 10.3389/frobt.2021.612415

Lewis, J. R. (2002). Psychometric evaluation of the PSSUQ using data from five years of usability studies. Int. J. Hum. Comput. Interact . 14, 463–488. doi: 10.1207/S15327590IJHC143&4_11

Li, L., Fu, Q., Tyson, S., Preston, N., and Weightman, A. (2021). A scoping review of design requirements for a home-based upper limb rehabilitation robot for stroke. Top. Stroke Rehabil . 00, 1–15. doi: 10.1109/ICIEA51954.2021.9516098

Lohse, K., Shirzad, N., Verster, A., Hodges, N., and Van Der Loos, H. F. (2013). Video games and rehabilitation: Using design principles to enhance engagement in physical therapy. J. Neurol. Phys. Ther . 37, 166–175. doi: 10.1097/NPT.0000000000000017

Lu, E. C., Wang, R. H., Hebert, D., Boger, J., Galea, M. P., and Mihailidis, A. (2011). The development of an upper limb stroke rehabilitation robot: identification of clinical practices and design requirements through a survey of therapists. Disab. Rehabilitat. Assist. Technol . 6, 420–431. doi: 10.3109/17483107.2010.544370

Maramba, I., Chatterjee, A., and Newman, C. (2019). Methods of usability testing in the development of eHealth applications: a scoping review. Int. J. Med. Inform . 126, 95–104. doi: 10.1016/j.ijmedinf.2019.03.018

Marchal-Crespo, L., Tsangaridis, P., Obwegeser, D., Maggioni, S., and Riener, R. (2019). Haptic error modulation outperforms visual error amplification when learning a modified gait pattern. Front. Neurosci . 13, 61. doi: 10.3389/fnins.2019.00061

Massie, T. H., and Salisbury, J. K. (1994). “The PHANTOM haptic interface: a device for probing virtual objects threee enabling observations three necessary criteria for an effective interface,” in ASME Winter Annual Meeting, Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems . Dynamic Systems and Control: Volume 55, no. 1 (American Society of Mechanical Engineers).

McAuley, E. D., Duncan, T., and Tammen, V. V. (1989). Psychometric properties of the intrinsic motivation inventoiy in a competitive sport setting: a confirmatory factor analysis. Res. Q. Exerc. Sport 60, 48–58. doi: 10.1080/02701367.1989.10607413

McCabe, J. P., Henniger, D., Perkins, J., Skelly, M., Tatsuoka, C., and Pundik, S. (2019). Feasibility and clinical experience of implementing a myoelectric upper limb orthosis in the rehabilitation of chronic stroke patients: a clinical case series report. PLoS ONE 14, 1–12. doi: 10.1371/journal.pone.0215311

Mercier, L., Audet, T., Hebert, R., Rochette, A., and Dubois, M.-F. (2001). Impact of motor, cognitive, and perceptual disorders on ability to perform activities of daily living after stroke. Stroke 32, 2602–2608. doi: 10.1161/hs1101.098154

Metzger, J.-C., Lambercy, O., Chapuis, D., and Gassert, R. (2011). “Design and characterization of the ReHapticKnob, a robot for assessment and therapy of hand function,” in 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (San Francisco: IEEE), 3074–3080.

Meyer, J. T., Gassert, R., and Lambercy, O. (2021). An analysis of usability evaluation practices and contexts of use in wearable robotics. J. Neuroeng. Rehabil . 18, 1–16. doi: 10.1186/s12984-021-00963-8

Meyer, J. T., Tanczak, N., Kanzler, C. M., Pelletier, C., Gassert, R., and Lambercy, O. (2023). Design and validation of a novel online platform to support the usability evaluation of wearable robotic devices. Wearable Technol . 4, 31. doi: 10.1017/wtc.2022.31

NICE (2023). “Stroke rehabilitation in adults (update),” in Number October . London: National Institute for Health and Care Excellence.

Pandian, S., Arya, K. N., and Davidson, E. W. (2012). Comparison of Brunnstrom movement therapy and motor relearning program in rehabilitation of post-stroke hemiparetic hand: A randomized trial. J. Bodyw. Mov. Ther . 16, 330–337. doi: 10.1016/j.jbmt.2011.11.002

Pettypiece, C. E., Goodale, M. A., and Culham, J. C. (2010). Integration of haptic and visual size cues in perception and action revealed through cross-modal conflict. Exp. Brain Res . 201, 863–873. doi: 10.1007/s00221-009-2101-1

Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., and Sutskever, I. (2023). “Robust speech recognition via large-scale weak supervision,” in International Conference on Machine Learning (PMLR) , 28492–28518.

Ranzani, R., Albrecht, M., Haarman, C. J., Koh, E., Devittori, G., Held, J. P., et al. (2023). Design, characterization and preliminary usability testing of a portable robot for unsupervised therapy of hand function. Front. Mechan. Eng . 8, 1–17. doi: 10.3389/fmech.2022.1075795

Ranzani, R., Eicher, L., Viggiano, F., Engelbrecht, B., Held, J. P. O., Lambercy, O., et al. (2021). Towards a platform for robot-assisted minimally-supervised therapy of hand function: design and pilot usability evaluation. Front. Bioeng. Biotechnol . 9, 2021.01.12.21249685. doi: 10.3389/fbioe.2021.652380

Rätz, R., Conti, F., Müri, R. M., and Marchal-Crespo, L. (2021a). A novel clinical-driven design for robotic hand rehabilitation: combining sensory training, effortless setup, and large range of motion in a palmar device. Front. Neurorobot . 15, 1–22. doi: 10.3389/fnbot.2021.748196

Rätz, R., Müri, R. M., and Marchal-Crespo, L. (2021b). “Assessment of clinical requirements for a novel robotic device forupper-limb sensorimotor rehabilitation after stroke,” in Proceedings of the 5th International Conference on Neurorehabilitation (ICNR2020) , eds. D. Torricelli, M. Akay, and J. L. Pons (Vigo: Springer International Publishing).

Rätz, R., Van Damme, N., Marchal-Crespo, L., and Aksöz, E. A. (2023). WO2023222678A1 Sensomotoric Hand Therapy Device (Patent) , 5th Edn. Churchill Livingstone.

Renner, C. I., Outermans, J., Ludwig, R., Brendel, C., Kwakkel, G., and Hummelsheim, H. (2016). Group therapy task training versus individual task training during inpatient stroke rehabilitation: A randomised controlled trial. Clin. Rehabil . 30, 637–648. doi: 10.1177/0269215515600206

Reynolds, L. (2007). “Measuring intrinsic motivations,” in Handbook of Research on Electronic Surveys and Measurements , R. A. Reynolds, R. Woods, and J. D. Baker (Pennsylvania: IGI Global), 170–173.

Rozevink, S. G., van der Sluis, C. K., and Hijmans, J. M. (2021). HoMEcare aRm rehabiLItatioN (MERLIN): preliminary evidence of long term effects of telerehabilitation using an unactuated training device on upper limb function after stroke. J. Neuroeng. Rehabil . 18, 141. doi: 10.1186/s12984-021-00934-z

Saric, L., Knobel, S. E., Pastore-Wapp, M., Nef, T., Mast, F. W., and Vanbellingen, T. (2022). Usability of two new interactive game sensor-based hand training devices in Parkinson's disease. Sensors 22, 16. doi: 10.3390/s22166278

Schaarup, C., Hartvigsen, G., Larsen, L. B., Tan, Z. H., Arsand, E., and Hejlesen, O. K. (2015). Assessing the potential use of eye-tracking triangulation for evaluating the usability of an online diabetes exercise system. Stud. Health Technol. Inform . 216, 84–88.

Schneider, E. J., Lannin, N. A., Ada, L., and Schmidt, J. (2016). Increasing the amount of usual rehabilitation improves activity after stroke: a systematic review. J. Physiother . 62, 182–187. doi: 10.1016/j.jphys.2016.08.006

Scott, S. H. (2004). Optimal feedback control and the neural basis of volitional motor control. Nat. Rev. Neurosci . 5, 532–544. doi: 10.1038/nrn1427

Simple Diarizer (2023). simple_diarizer . Available online at: (accessed November 11, 2023).

Sivan, M., Gallagher, J., Makower, S., Keeling, D., Bhakta, B., O'Connor, R. J., et al. (2014). Home-based Computer Assisted Arm Rehabilitation (hCAAR) robotic device for upper limb exercise after stroke: Results of a feasibility study in home setting. J. Neuroeng. Rehabil . 11, 1. doi: 10.1186/1743-0003-11-163

Sugawara, A. T., Ramos, V. D., Alfieri, F. M., and Battistella, L. R. (2018). Abandonment of assistive products: assessing abandonment levels and factors that impact on it. Disabil. Rehabilitat. Assist. Technol . 13, 716–723. doi: 10.1080/17483107.2018.1425748

Sullivan, G. M., and Artino, A. R. (2013). Analyzing and interpreting data from likert-type scales. J. Grad. Med. Educ . 5, 541–542. doi: 10.4300/JGME-5-4-18

Swanson, V. A., Johnson, C., Zondervan, D. K., Bayus, N., McCoy, P., Ng, Y. F. J., et al. (2023). Optimized home rehabilitation technology reduces upper extremity impairment compared to a conventional home exercise program: a randomized, controlled, single-blind trial in subacute stroke. Neurorehabilitat. Neural Repair . 15, 4596832211469. doi: 10.1177/15459683221146995

Tollár, J., Nagy, F., Csutorás, B., Prontvai, N., Nagy, Z., Török, K., et al. (2021). High frequency and intensity rehabilitation in 641 subacute ischemic stroke patients. Arch. Phys. Med. Rehabil . 102, 9–18. doi: 10.1016/j.apmr.2020.07.012

Tollár, J., Vetrovsky, T., Széphelyi, K., Csutorás, B., Prontvai, N., ács, P., et al. (2023). Effects of 2-year-long maintenance training and detraining on 558 subacute ischemic stroke patients' clinical motor symptoms. Med. Sci. Sports & Exerc . 55, 607–613. doi: 10.1249/MSS.0000000000003092

Van Damme, N., Ratz, R., and Marchal-Crespo, L. (2022). “Towards unsupervised rehabilitation: development of a portable compliant device for sensorimotor hand rehabilitation,” in IEEE International Conference on Rehabilitation Robotics 2022-July:25–29 (Rotterdam: IEEE).

Vlachogianni, P., and Tselios, N. (2023). Perceived Usability Evaluation of Educational Technology Using the Post-Study System Usability Questionnaire (PSSUQ ): A Systematic Review .

Ward, N. S., Brander, F., and Kelly, K. (2019). Intensive upper limb neurorehabilitation in chronic stroke: Outcomes from the Queen Square programme. J. Neurol. Neurosurg. Psychiat . 90, 498–506. doi: 10.1136/jnnp-2018-319954

Wilson, P. H., Rogers, J. M., Vogel, K., Steenbergen, B., McGuckian, T. B., and Duckworth, J. (2021). Home-based (virtual) rehabilitation improves motor and cognitive function for stroke patients: a randomized controlled trial of the Elements (EDNA-22) system. J. Neuroeng. Rehabil . 18, 165. doi: 10.1186/s12984-021-00956-7

Wittmann, F., Held, J. P., Lambercy, O., Starkey, M. L., Curt, A., Höver, R., et al. (2016). Self-directed arm therapy at home after stroke with a sensor-based virtual reality training system. J. Neuroeng. Rehabil . 13, 1–10. doi: 10.1186/s12984-016-0182-1

Wolf, K., Mayr, A., Nagiller, M., Saltuari, L., Harders, M., and Kim, Y. (2022). PoRi device: portable hand assessment and rehabilitation after stroke. Automatisierungstechnik 70, 1003–1017. doi: 10.1515/auto-2022-0037

Wolf, S. L., Biology, C., Sahu, K., Bay, R. C., Buchanan, S., Healthcare, S., et al. (2015). The HAAPI (home arm assistance progression initiative) trial: - a novel robotics delivery approach in stroke rehabilitation. Neurorehabil. Neural Repair 29, 958–968. doi: 10.1177/1545968315575612

Zbytniewska-Mégret, M., Salzmann, C., Kanzler, C. M., Hassa, T., Gassert, R., Lambercy, O., et al. (2023). The evolution of hand proprioceptive and motor impairments in the sub-acute phase after stroke. Neurorehabil. Neural Repair . 37, 823–836. doi: 10.1177/15459683231207355

Keywords: neurorehabilitation, robotic, home rehabilitation, group therapy, haptic rendering, portable, grasping, usability

Citation: Rätz R, Ratschat AL, Cividanes-Garcia N, Ribbers GM and Marchal-Crespo L (2024) Designing for usability: development and evaluation of a portable minimally-actuated haptic hand and forearm trainer for unsupervised stroke rehabilitation. Front. Neurorobot. 18:1351700. doi: 10.3389/fnbot.2024.1351700

Received: 06 December 2023; Accepted: 20 March 2024; Published: 04 April 2024.

Reviewed by:

Copyright © 2024 Rätz, Ratschat, Cividanes-Garcia, Ribbers and Marchal-Crespo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Alexandre L. Ratschat,

This article is part of the Research Topic

Rehabilitation Robotics: Challenges in Design, Control, and Real Applications Volume II


  1. Evaluation Research: Definition, Methods and Examples

    research analysis and evaluation

  2. PPT

    research analysis and evaluation

  3. Research vs Evaluation: Unveiling Key Differences

    research analysis and evaluation

  4. Evaluation vs. Analysis: 7 Key Differences To Know, Pros & Cons

    research analysis and evaluation

  5. Evaluation Research Design: Examples, Methods & Types

    research analysis and evaluation

  6. Evaluation Research Examples

    research analysis and evaluation


  1. Data Science or Analysis, business analysis, monitoring and evaluation, data management

  2. BSN || Research || Other Additional Quantitative research #research #research_design

  3. Differences Between Research and Analysis

  4. Differences Between Comparative Research and Evaluation Research

  5. SciVal A tool for evidence based research planning Identifying trending topics

  6. Types of Evaluation Research


  1. What Is Evaluation?: Perspectives of How Evaluation Differs (or Not

    Source Definition; Suchman (1968, pp. 2-3) [Evaluation applies] the methods of science to action programs in order to obtain objective and valid measures of what such programs are accomplishing.…Evaluation research asks about the kinds of change desired, the means by which this change is to be brought about, and the signs by which such changes can be recognized.

  2. How Can Research Be Evaluated?

    To be effective, the design of the framework should depend on the purpose of the evaluation: advocacy, accountability, analysis and/or allocation. Research evaluation tools typically fall into one of two groups, which serve different needs; multiple methods are required if researchers' needs span both groups.

  3. Research Analysis & Evaluation

    About usA. Hon'ble Professor & Research Scholars. RESEARCH ANALYSIS AND EVALUATION is an International Research journal waiting for your Research Paper publication.This is monthly,Referred, interdiciplinery and multilingula (English,Hindi,Marathi & Gujarati) Research journal so now you can send your Research paper for Publication .Please send ...

  4. Evaluating Research in Academic Journals: A Practical Guide to

    New to this edition: New chapters on: - evaluating mixed methods research - evaluating systematic reviews and meta-analyses - program evaluation research Updated chapters and appendices that ...

  5. Chapter 36. Introduction to Evaluation

    Analysis and synthesis. Analysis and synthesis are methods to discover and summarize an evaluation's findings. They are designed to detect patterns in evidence, either by isolating important findings (analysis) or by combining different sources of information to reach a larger understanding (synthesis). ... Qualitative evaluation and research ...

  6. Research Evaluation

    Evaluation is an essential aspect of research. It is ubiquitous and continuous over time for researchers. Its main goal is to ensure rigor and quality through objective assessment at all levels. It is the fundamental mechanism that regulates the highly critical and competitive research processes.

  7. Measuring research: A guide to research evaluation frameworks and tools

    This report provides a guide to the key considerations in developing an approach to research evaluation. It outlines the trade-offs that have to be taken into account and the contextual factors that need to be understood, drawing on experience of international approaches to research evaluation. In addition, a detailed overview of six research ...

  8. PDF Developing a research evaluation framework

    in terms of the '4 As': Analysis, Accountability, Advocacy and Allocation • We present a decision tree to help develop a research evaluation framework that suits the purpose • Further details of trade-offs, advantages and disadvan-tages, and previous applications of 14 research evalu-ation frameworks from six continents can be found in

  9. What Is Evaluation?: Perspectives of How Evaluation Differs (or Not

    Overall, evaluators believed research and evaluation intersect, whereas researchers believed evaluation is a subcomponent of research. Furthermore, evaluators perceived greater differences between evaluation and research than researchers did, particularly in characteristics relevant at the beginning (e.g., purpose, questions, audience) and end ...

  10. American Journal of Evaluation: Sage Journals

    Each issue of the American Journal of Evaluation (AJE) explores decisions and challenges related to conceptualizing, designing and conducting evaluations. Four times/year it offers original, peer-reviewed, articles about the methods, theory, ethics, politics, and practice of evaluation. View full journal description

  11. Implementing the Evaluation Plan and Analysis: Who, What, When, and How

    Given the complexity of program evaluation, it's important to have a shared model of how you will implement the evaluation, outlining the when, who, what, and how (see the Figure ). If you plan to share your work as generalizable knowledge (versus internal improvement), consider reviewing the institutional review board criteria for review. Figure.

  12. Research Evaluation

    About the journal. Research Evaluation is an interdisciplinary peer-reviewed, international journal. Its subject matter is the evaluation of activities concerned with scientific research, technological development and innovation …. Find out more.

  13. A Decade of Research on Evaluation: A Systematic Review of Research on

    Although investigations into evaluation theories, methods, and practices have been occurring since the late 1970s, research on evaluation (RoE) has seemingly increased in the past decade. ... Evidenced using qualitative comparative analysis (QCA). Evaluation: The International Journal of Theory, Research and Practice, 12, 352-372. Google ...

  14. Evaluation Research

    This evaluation comprised 4 components: technical accessibility, user experience (UX), quality and learning design; 10 experts were involved in its design and validation. "The combination of qualitative studies through interviews with MOOC providers and learners and the quantitative information provided by the MOOC survey data has provided an ...

  15. Research Analytics & Evaluation

    From fast and flexible data retrieval to its wide range of data visualization capabilities, SciVal helps institutions easily explore, analyze and report on global research. Research evaluation and analysis tools like SciVal are also key to connecting researchers and groups for cross-sector collaboration and partnerships.

  16. Evaluation Research

    Evaluation research examines whether interventions to change the world work, and if so, how and why. Qualitative inquiries serve diverse evaluation purposes. Purpose is the controlling force in determining evaluation use. Decisions about design, data collection, analysis, and reporting all flow from evaluation purpose.

  17. Finding your way: the difference between research and evaluation

    A broadly accepted way of thinking about how evaluation and research are different comes from Michael Scriven, an evaluation expert and professor. He defines evaluation this way in his Evaluation Thesaurus: "Evaluation determines the merit, worth, or value of things.". He goes on to explain that "Social science research, by contrast, does ...

  18. How to use and assess qualitative research methods

    How to conduct qualitative research? Given that qualitative research is characterised by flexibility, openness and responsivity to context, the steps of data collection and analysis are not as separate and consecutive as they tend to be in quantitative research [13, 14].As Fossey puts it: "sampling, data collection, analysis and interpretation are related to each other in a cyclical ...

  19. Evaluation Research: Definition, Methods and Examples

    The process of evaluation research consisting of data analysis and reporting is a rigorous, systematic process that involves collecting data about organizations, processes, projects, services, and/or resources. Evaluation research enhances knowledge and decision-making, and leads to practical applications. LEARN ABOUT: Action Research.

  20. The Communicator's Guide to Research, Analysis, and Evaluation

    A five-step cyclical process based on the core components of communication research, analysis, and evaluation serves as the cornerstone of this report. This Guide also underscores why research, analysis, and evaluation are critical in communication. Additionally, the Guide features examples and applications, a research and evaluation cadence ...

  21. Design and Implementation of Evaluation Research

    Evaluation has its roots in the social, behavioral, and statistical sciences, and it relies on their principles and methodologies of research, including experimental design, measurement, statistical tests, and direct observation. What distinguishes evaluation research from other social science is that its subjects are ongoing social action programs that are intended to produce individual or ...

  22. Research Analysis & Evaluation

    Research Analysis & Evaluation; International Double Blind Peer Reviewed, Refereed, Multilingual,Multidisciplinary & Indexed- Monthly Research Journal; ISSN(P) : 0975-3486 ; ISSN(E) : 2320-5482 RNI : RAJBIL2009/30097; Impact Factor : 6.376 (SJIF) Home; Aims and Scope; Plagiarism Policy; Publication Ethics; Received Paper;

  23. PDF The Communicator'S Guide to Research, Analysis, and Evaluation

    components to communication research, analysis, and evaluation. The repetition of this cyclical (not linear) sequence lays the foundation for continual improvement. This report focuses on these five core components in detail. V. EVALUATION. AND CONTINUOUS. IMPROVEMENT: Research-based . evaluation to assess . the extent to which . the program met or

  24. Toward a framework for selecting indicators of measuring ...

    For research purposes, this analysis focused only on papers in peer-reviewed scientific journals in English (Seuring and Muller 2008; Adjei-Bamfo et al. 2019; Alshqaqeeq et al. 2020; Merli et al. 2020). The collection of articles ended March 1, 2021, the process of article analysis and study ended April 15, 2021, and the article writing was ...

  25. Treatment options for digital nerve injury: a systematic review and

    All 66 articles were evaluated for the quality assessment using the JBI-MAStARI evaluation tool, and the research evaluation levels were high or medium. The specific evaluation results are shown in Tables 2, 3 and 4. The P values derived from Egger's test indicated their inexistence of the publication bias in most meta-analyses.

  26. Research, Analysis and Evaluation

    Research, Analysis and Evaluation. CNA specializes in translating academic research into practice in the field. By harnessing research findings, sophisticated analysis and a robust understanding of systems, we encourage our nation's criminal justice leaders to approach problems differently, with an evidence base that promotes safety ...

  27. Evaluation Research Analysis :: University of Waikato

    Evaluation Research Analysis. 2024. Change year. 2023; 2022; 2021; 2020; 2019; 2018; 30. 500. 01 Jul 2024 - 10 Nov 2024 ... Jump to. This paper provides an introduction to evaluation praxis with a major focus on completing a small scale evaluation of a social service or health programme. Such roles as consultant, advocate, liaison and ...

  28. Research Methods & Evaluation

    Sage Research Methods & Evaluation is at the forefront of research and scholarship, providing classic and cutting-edge books, video collections, reference materials, cases built on real research, key journals, and our online platform for the community Methodspace.. Download special issues, collections, and a selection of most read articles.Browse our journal portfolio.

  29. Frontiers

    Our research underscores the importance of continuous testing in the development process and offers significant contributions to the design of user-friendly, unsupervised neurorehabilitation technologies to improve sensorimotor stroke rehabilitation. ... An analysis of usability evaluation practices and contexts of use in wearable robotics. J ...

  30. Performance Evaluation and Application Field Analysis of Precise Point

    The most commonly used real-time augmentation services in China are the International GNSS Service's (IGS) real-time service (RTS), PPP-B2b service, and Double-Frequency Multi-Constellation (DFMC) service of the BeiDou Satellite-Based Augmentation System (BDSBAS) service. However, research on the performance evaluation, comparison, and application scope of these three products is still ...