Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Developing Formative Assessment in the Classroom: Using action research to explore and modify theory

Profile image of Harry Torrance

2010, Wiley Blackwell

Related Papers

Brendan Kean

formative assessment action research

Martina Maněnová

Wiyaka - Wiyaka

Learning assessments can serve two purposes, formative and summative. The first is commonly associated to assessment for learning while the latter is aggregated as assessment of learning. However, formative assessment has often played second fiddle to summative assessment. While there have been several studies which proved the significant effect of formative assessments to increase student achievement in learning, not many teachers do practice formative assessments due some reason. This study aims to discover English teachers‘ experiences regarding formative assessment so as to identify key problems and challenges they face in implementing. This research employed qualitative design in which deep exploration of the phenomenon was conducted. The study revealed that teachers‘ main constraints lay on, insufficient knowledge of formative assessments, time management, and the absence of formal guidance from the authority.

Patrick Kean

In the field of education, how and why teachers develop formative assessment in their classrooms and how this focus can improve student learning is under researched. This thesis aimed to develop an understanding of how a range of formative assessment strategies can be developed and implemented in the classroom and enhanced through collaboration amongst teachers within professional learning teams. The study was conducted in three phases including two cycles of action research within one school and a comparative school case study, undertaken to deepen understanding of formative assessment and collaboration. Phase 1 involved action research in an international school in Hong Kong to develop formative assessment strategies in my own and two colleagues’ classrooms. The phase 2 action research cycle investigated how formative assessment implementation could be enhanced and developed through collaboration within a professional learning team (PLT). Phase 3 was a comparative case study of fo...

Assessment in Education: Principles, Policy & Practice

Joshua B . Gardiner

vjollca ahmedi

The continual and swift reforms the education system in Kosovo has endured in the recent decade have continuously challenged the teaching staff. The aim of this study is to ascertain whether there is a connection between the teachers’ attitudes towards formative assessment and the application of this assessment method. The alternative hypothesis is that there is a statistically significant correlation between the teachers’ attitudes and actions towards formative assessment. Results indicate that a connection between the teachers’ attitudes and practices towards formative assessment, r=0.620, is noticeable. Tellingly, t-test results indicate that there are differences between attitudes towards formative assessment and its implementation in practice. The average of the teachers’ attitudes towards formative assessment is higher than the average of teachers who apply formative assessment.

Lúcia Amante

In this article we present a research project whose results are the outcome of a teachers’ training course, for the practice of formative assessment, with the use of digital tools, applied to students from the third cycle, of basic education, to secondary education. Our key question, "How is formative assessment put into practice in the classroom of the teachers involved in our training workshop?", stemmed from the following core objectives: understanding if teachers integrate the technological dimension into the design of their students’ assessment and perceiving the sort of digital tools they use to promote teaching and the regulation of the students’ learning. The focus of our training course, developed as a workshop, was to understand how the teachers' practices had changed after attending the course. The study was developed as a qualitative and interpretive research and the teachers’ training plan was elaborated in an action-research methodology. The collected dat...

Chinese Journal of Applied Linguistics, 46(2), pp. 155-161

Peter Yongqi Gu

Formative assessment (FA) has been increasingly recognised as a powerful tool to improve teaching and learning, and thereby increase educational effectiveness. As such, FA has been written into government directives and curriculum standards; and incorporated into teacher education programmes. At the classroom level, however, teachers have found FA a formidable task that is difficult to implement. This has been attributed to the teacher's lack of assessment literacy, among other reasons. In this guest-editor introduction, we frame the special issue and its scope by highlighting the main issues involved. We then briefly introduce the 10 articles which we believe, taken together, advance our understanding of teacher formative assessment literacy and its development.

British Educational Research Journal

RELATED PAPERS

chintya sinaga

GABRIEL HOH TECK LING

The Astrophysical Journal

J. Kormendy

Eyal Ben Zion

Mihret ULSIDO

Sains Peternakan

Purwaningsih Purwaningsih

D. Benzeggouta

Pesquisar Revista De Estudos E Pesquisas Em Ensino De Geografia

orlando ednei Ferretti

Alfloditte Feudjio

Ciência Rural

Sergio Schwarz

Transplantation Proceedings

TARIK SQALLI HOUSSAINI

Universitat de València

Paul Gavriloaea

Marine Pollution Bulletin

Faiza Al-Yamani

مجلة لغة كلام

Soumia Debbache

CSEE Journal of Power and Energy Systems

Gaber Magdy

Sarah Raskin

Fusion Engineering and Design

Hitesh Gulati

The Review of Black Political Economy

Green Chemistry

Anna Kubacka

Neuroscience Letters

Alejandro Chávez

Kurt Heilinger

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

MIT Libraries home DSpace@MIT

  • DSpace@MIT Home
  • MIT Open Access Articles

Using action research to improve learning and formative assessment to conduct research

Thumbnail

Publisher Policy

Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Terms of use

Date issued, collections.

Designing Formative Assessment That Improves Teaching and Learning: What Can Be Learned from the Design Stories of Experienced Teachers?

  • Open access
  • Published: 17 October 2023
  • Volume 7 , pages 182–194, ( 2023 )

Cite this article

You have full access to this open access article

formative assessment action research

  • Janneke van der Steen   ORCID: orcid.org/0000-0001-6335-4824 1 , 2 ,
  • Tamara van Schilt-Mol   ORCID: orcid.org/0000-0002-4714-2300 2 ,
  • Cees van der Vleuten   ORCID: orcid.org/0000-0001-6802-3119 1 &
  • Desirée Joosten-ten Brinke   ORCID: orcid.org/0000-0001-6161-7117 3  

4379 Accesses

27 Altmetric

Explore all metrics

This article reports on findings of a qualitative study that investigated the difficulties teachers encounter while designing formative assessment plans and the strategies experienced teachers use to avoid those pitfalls. The pitfalls were identified through an analysis of formative assessment plans that searched for potential threats to alignment, decision-driven data collection, and room for adjustment and improvement. The main pitfalls in the design process occurred when teachers did not explicitly and coherently link all elements of their formative assessment plan or when they did not plan to effectively use information about student learning to improve teaching and learning. Through interviews with experienced teachers, we identified seven design strategies they used to design formative assessment plans that were aligned, consisted of decision-driven data collection, and left room for adjustment and improvement. However, these experienced teachers still encountered difficulties in determining how to formulate the right decisions for decision-driven data collection and how to provide students with enough room for improvement. Lessons learned from the design strategies of these experienced teachers are incorporated in design steps for formative assessment plans all teachers can use.

Similar content being viewed by others

formative assessment action research

Online learning in higher education: exploring advantages and disadvantages for engagement

formative assessment action research

A literature review: efficacy of online learning courses for higher education institution using meta-analysis

formative assessment action research

Design and Development Research

Avoid common mistakes on your manuscript.

Introduction

Formative assessment can be seen as an ongoing process of monitoring students’ learning to decide which teaching and learning actions should be taken to better suit students’ needs (Allal, 2020 ; Black & Wiliam, 2009 ). Activities that are part of effective formative assessment include clarifying expectations, eliciting and analyzing evidence of student learning, communicating the outcomes with students, and performing suitable follow-up activities in teaching and learning (Antoniou & James, 2014 ; Ruiz-Primo & Furtak, 2007 ; Veugen et al., 2021 ). Formative assessment reveals students’ learning progress and what is needed to further this learning. Teachers can use this information to make better informed formative decisions about the next steps in teaching (Black & Wiliam, 2009 ).

Since formative assessment strengthens the connection between teaching and learning, it can be a solid intervention for improving both. However, implementing formative assessment that “works” is challenging for teachers. Research describes many pitfalls teachers can encounter when implementing formative assessment. For example, studies that investigated the implementation of formative assessment in practice conclude that in order to be effective, there needs to be more consideration for the integration, coherency, and alignment of formative assessment in classroom practice (Gulikers et al., 2013 ; Van Den Berg, 2018 ; Wylie & Lyon, 2015 ). Formative assessment should be aligned with learning objectives, lesson activities, and other assessment activities (Biggs, 1996 ; Gulikers et al., 2013 ). Moreover, since learning objectives often exceed lessons, this alignment of formative assessment should even be considered for multiple related lessons.

Additionally, Wiliam ( 2013 , 2014 ) states that formative assessment activities that elicit evidence about student learning should also be designed in alignment with the decisions teachers wish to make based on the outcomes of these activities. Therefore, he recommends decision-driven data collection to ensure teachers and students receive the timely information they need to make well-informed formative decisions about the next steps in teaching and learning (Wiliam, 2013 ). However, teachers do not always incorporate decision-driven data collection in formative assessment, and this can be a pitfall when, for instance, existing data on student learning does not represent the current situation of learners or comes too late for a meaningful follow-up (Wiliam, 2013 ).

A recent study conducted by Veugen et al. ( 2021 ) examined students’ and teachers’ perceptions of formative assessment practice and revealed a final example of difficulties teachers seem to encounter when they implement formative assessment. Veugen et al. found that teachers who implement formative assessment use activities that clarify expectations and elicit and analyze student reactions. This results in feedback for the students, but the teachers report few adaptations to teaching and learning based on the outcomes of these activities. Without such follow-up activities, it is unlikely that formative assessment enhances student learning because students do not get the opportunity to use the feedback they were given, and teachers do not get the opportunity to adapt their teaching to students’ needs (Black & Wiliam, 2009 ; Veugen et al., 2021 ). Formative assessment is not complete without a follow-up where students and teachers have room for adjustment and improvement. This room can be created in lessons that follow the analysis and communication of evidence of student learning. In summary, teachers encounter a range of difficulties when implementing formative assessment. Formative assessment should be aligned, include decision-driven data collection, and leave room for adjustment and improvement but, in practice, these criteria are rarely met.

It seems to be a complex task for teachers to consider these three criteria when conducting formative assessment. As a result, some teachers succeed in enacting formative assessment as recommended in the literature, while others experience more difficulties in reaching this goal (Offerdahl et al., 2018 ; Veugen et al., 2021 ). Previous research suggests that pre-planning formative assessment is fundamental for teachers to ensure the effectiveness of these activities by encompassing all essential characteristics (van der Steen et al., 2022 ). So far, literature focusing on designing formative assessment has concentrated mainly on designing individual formative assessment activities (Furtak et al., 2018 ). However, only when teachers design formative assessment in plans that encompass multiple lessons can they tackle difficulties such as alignment and planning follow-up activities in an effective and feasible way (van der Steen et al., 2022 ; Wiliam, 2013 ). Taking a broad view of multiple lessons helps teachers consider the alignment between all lessons and activities that contribute to achieving the intended learning objectives. Furthermore, it increases their opportunities to plan for room for adjustment and improvement.

Based on the outcomes of earlier research (van der Steen et al., 2022 ), 64 teachers from four secondary schools were given time and knowledge to help them design formative assessment plans that met the criteria: alignment, decision-driven data collection, and room for adjustment and improvement. Still, even in this context, differences emerged between teachers when they were designing formative assessment. It seemed that teachers who already had experience with formative assessment in their classroom had an advantage in successfully designing formative assessment plans over teachers who did not yet have this experience.

Since there is a lack of literature about how teachers design formative assessment plans (van der Steen et al.,  2022 ), it is unclear how teachers who design and implement formative assessment successfully design their formative assessment plans. Which design strategies do they use, and what can other teachers learn from their experiences? Therefore, the present study focuses on how experienced teachers design formative assessment plans aligned with learning objectives, lessons, and prospective formative decisions while taking follow-up actions into account. Once it becomes clear how teachers design such formative assessment plans, this knowledge can be used to support teachers who struggle with implementing formative assessment as intended. Therefore, the outcomes of this study will result in design steps and strategies for all teachers.

Accordingly, the research question for this study is:

How do experienced teachers design formative assessment plans that are aligned, include decision-driven data collection, and leave room for adjusting and improving teaching and learning?

Sub-questions that help answer this research question are:

What are pitfalls that can threaten the design of formative assessment plans that are aligned, include decision-driven data collection, and leave room for adjusting and improving teaching and learning?

How do experienced teachers design formative assessment plans that:

are aligned with learning objectives, lesson activities, and other assessment?

include decision-driven data collection?

leave room for adjusting and improving teaching and learning?

Material and Methods

The context of this study is an educational design research project funded by a grant which provided four secondary schools with the opportunity to advance formative assessment in their schools. At these schools, teachers designed formative assessment plans in teacher learning communities. Teacher learning communities are groups of teachers that come together for sustained periods of time — in this case, 15 meetings during a 16-month period — to engage in inquiry and problem solving with the goal of improving student learning (Van Es, 2012 ). The teacher learning communities in this study focused on improving formative assessment and formative decision-making by designing formative assessment plans. The activities in the teacher learning communities were coordinated by the first author, who provided the teachers with information and support in designing formative assessment plans.

Each school had a teacher learning community that consisted of 11 to 24 teachers — a total of 64 teachers across the four schools — who designed formative assessment plans for their lessons according to five design steps (Fig.  1 ). These five design steps are based on design principles for formative assessment plans that meet the three quality criteria: alignment (design steps 1 and 5), decision-driven data collection (design steps 2, 3, and 5), and room for adjustment and improvement (design step 4) (van der Steen et al., 2022 ).

figure 1

Five design steps for formative assessment plans

During a previous design cycle, teachers used an earlier version of the design steps for the first time. Thus, most teachers had experience designing a formative assessment plan prior to this study. The design steps were evaluated and adjusted based on group interviews and an analysis of the formative assessment plans designed during that first design cycle. The adjustments mainly concentrated on making the design steps more concise and emphasizing, within the design steps, the importance of communication with students, the link with formative decision-making, and planning room for students and teachers to improve.

In this study, the formative assessment plans the teachers designed and design stories of experienced teachers are used to answer the research questions. Figure  2 shows an example of a formative assessment plan.

figure 2

Example of a formative assessment plan designed by Tracy

This study has a qualitative research design. The first sub-question will be answered by analyzing the collected formative assessment plans for the presence and different appearances of alignment, decision-driven data collection, and room for adjustment and improvement, together with the pitfalls that prevent formative assessment plans from meeting these criteria. The second sub-question about what experienced teachers do to avoid these pitfalls will be answered based on interviews with experienced teachers who participated in this project.

Participants

Thirty-one teachers of 15 subjects from the four participating secondary schools were involved in answering sub-question 1 (see Table 1 ). All teachers ( n  = 64) that participated in the teacher learning communities at one of the four schools were asked if they could send their formative assessment plans (if they had finalized their plan by that time). To get a representative and information-rich overview of the pitfalls teachers encounter while designing formative assessment, despite experience in teaching and formative assessment or subjects taught, the researchers aimed to gather all finished formative assessment plans. This request resulted in 26 formative assessment plans from 31 teachers (presented in Table 1 ).

To answer sub-question 2, experienced teachers were recruited from the participating schools. To ensure the interviews provided in-depth information for this study, the teachers had to be actively involved in the design project and have multiple years of experience with formative assessment so they could really understand and explain their choices and considerations in the design process.

All four participating schools were asked to find two teachers for the interviews who (1) agreed to contribute to this research via interviews (2) had finished designing their second formative assessment plan with success and (3) had experience with formative assessment prior to the start of teacher learning communities. For one school at which working with formative assessment was relatively new, no teachers that met these criteria could be found. The other three schools did find two teachers, as presented in Table 2 (all names are pseudonyms).

The table does not show the schools at which these teachers are employed, and a more general description was chosen for the language teachers to ensure their anonymity.

Analysis of Formative Assessment Plans

The 26 formative assessment plans came from 15 subjects and all four schools, varying from five to eight plans per school. That variety makes it likely that this collection of plans can provide a representative sample of formative assessment plans designed based on the five design steps, so there was no need to gather more plans.

Criteria for Analyzing Formative Assessment Plans

Before analyzing the plans to answer sub-question 1, a description was made of the criteria each element must meet to receive a positive evaluation. These were as follows:

Alignment : plans should show proof of coherency between learning objectives, lesson activities, and assessment activities.

Decision-driven data collection : plans should show proof that eliciting data about student learning was linked to a predetermined formative decision about the next steps in teaching and learning.

Room for adjustment and improvement : plans should show proof of space and opportunity for both teachers and learners to adjust and/or improve based on the information about learning that was collected.

Procedure for Analyzing Formative Assessment Plans

The first step in analyzing the formative assessment plans was evaluating the plans on the three criteria: alignment, decision-driven data collection, and room for adjustment and improvement. This analysis was conducted by two researchers individually: the first author and one colleague researcher.

Second, for each criterion, the researchers discussed the differences in appreciation of the quality of the plans before they addressed the differences and similarities between the plans that succeeded in meeting the criterion and the plans that had not. What pitfalls appear in the plans that did not meet a criterion, and what can be learned from the plans that did? Some plans were more elaborately described and explained than others. Therefore, the results in this study are an analysis of the pitfalls in the plans with a clear and elaborate description and the possible pitfalls in the less well described plans.

Interviews with Experienced Teachers

Semi-structured teacher interviews were used to answer sub-question 2 and gain deep insight into the steps experienced teachers take to design their formative assessment plans. What did they do in addition to or differently from the five design steps, and how did this contribute to meeting the three quality criteria for formative assessment plans (sub-question 2)?

Guide for Interviewing Experienced Teachers

In the interviews, teachers were asked about their design process. Each interview started with the question: “How did designing this formative assessment plan come about?” Possible follow-up questions were: (a) “What did you do?,” (b) “Which steps did you take in designing this plan?,” and (c) “What difficulties did you encounter in designing this plan, and how did you resolve them?”

After the teachers explained their design process in their own words, the conversation turned to comparing the design steps to the design story the teacher had just shared. Where had the teacher followed the five design steps, and where did their process differ? For example, the teachers were asked about their choices in step 2 of the design process about when and why they had planned checkpoints, what they chose to do at each checkpoint to elicit information about student learning, and whether they had linked checkpoints together.

Procedure for Analyzing the Interviews and Writing Narratives of Experienced Teachers

The interviews were transcribed and coded through template analysis (Brooks et al., 2015 ). The statements and comments about the teachers’ decisions and actions during the process of designing their formative assessment plan were coded using the five design steps (Fig.  1 ) and put into a narrative for each teacher. Based on each narrative, the researchers used the teachers’ choices and actions that contributed to alignment, decision-driven data collection, and room for adjustment and improvement to answer the questions about what experienced teachers do in their design process to meet the three criteria this study focuses on (sub-questions 2a, 2b, and 2c).

Pitfalls in Designing Formative Assessment Plans

Sub-question 1 was: “What are pitfalls that can threaten the design of formative assessment plans that are aligned, include decision-driven data collection, and leave room for adjusting and improving teaching and learning?” The results from analyzing the formative assessment plans are presented per criterion.

The main pitfalls related to alignment were:

Learning objectives, lesson activities, and/or assessment activities were missing from the plan.

Learning objectives, lesson activities, and/or assessment activities were not clearly described.

Learning objectives, lesson activities, and/or assessment activities were not explicitly linked.

There was a mismatch between the learning objectives, lesson activities, and assessment activities. For example, the final test or formative assessment did not match the learning objectives.

The three main pitfalls related to decision-driven data collection were:

The absence of formative decisions that the data collection was based on. Specific formative decisions were not explicitly linked to each checkpoint and could not be deduced from the planned follow-up. When these decisions are unclear, it is impossible to determine whether the corresponding data collection was accurate for and aligned with the decision at hand.

Situations in which predetermined formative decisions were present in a plan, but the data collection about student learning was not expected to collect the necessary information to inform the corresponding formative decision. The teacher did not plan to assess what they really needed to know. Four specific examples were found:

A mismatch between the data collection method and the necessary data. For example, a teacher who wants to find out about speech through mini whiteboards.

Situations in which teachers found a logical method but did not ask the right questions or use the proper assignments to discover where learning on a specific learning objective was and/or discover the next best step.

Situations in which teachers only planned to collect information on learning from a few students, but they wanted to use this information to make a formative decision for all students.

The planned data collection only shows whether a student has achieved a learning objective; it does not lead to information about what is needed in the follow-up to take learning further.

Situations in which teachers found the right way to find the information on learning they needed, but the outcomes only became visible to the students (e.g., when students only give each other peer feedback). When outcomes only become visible to students, teachers cannot use the information to inform their formative decisions because they do not have it.

The three pitfalls related to room for adjustment and improvement were:

Including no time for adjustment and improvement in the plan.

Only including room for teachers to adjust and improve OR only for students to adjust and improve; no room was incorporated and described for both.

Failing to confirm a follow-up by using another check to determine whether learning on a specific objective had increased because of the planned adjustment and improvement.

The Design of Formative Assessment Plans by Experienced Teachers

We used the narratives of six teachers (as presented in Online Resource 1 ) to answer sub-question 2: “How do experienced teachers design formative assessment plans that are aligned with learning objectives, lesson activities, and other assessment, include decision-driven data collection, and leave room for adjusting and improving teaching and learning?” The results are presented per criteria.

Alignment with Learning Objectives, Lesson Activities, and Other Assessment

All the teachers started with an existing series of lessons. According to Tracy, this is a coherent foundation from which to start when planning formative assessment if those lessons were designed based on the desired learning objectives (as they were in these cases). Tracy started with a series of lessons that also was aligned externally with an annual schedule that included all learning objectives and criteria for success and was consistent and aligned with the years ahead of or behind the current class. However, Patricia added that these planned series of lessons are not fixed. She stated that aligning formative assessment with existing lessons and lesson activities is an active process of determining whether lesson planning requires something different based on the choices made in designing formative assessment and simultaneously determining how formative assessment can enhance learning.

All the teachers described continuously checking for alignment during the design process, looking at the learning objectives, (formative) assessment, and lesson activities collectively. Stuart even believes checking for coherency and alignment should continue after the design process during and after the execution of the formative assessment plan: “Only then can you really fly over it and notice whether the cohesion is good enough.”

All the teachers made sure they understood the learning objectives by taking time to zoom in and formulate criteria for success and/or to zoom out to look for overarching learning objectives to bring everything together. Patricia, Stuart, and Jenna took time to look closely at learning objectives, transcend specifics, discover the coherence between objectives, or find broader objectives that can connect different learning objectives. Another group of teachers (Tracy, Stuart, Bernadette, and Lisa) took the time to formulate criteria for success so objectives would be more specific for teachers and students. Stuart analyzed former lessons, assignments, and previous misconceptions to get a good idea of the criteria for success that should be pursued. Lisa and Tracy formulated these criteria for success together with their students based on examples of work. Patricia, Jenna, and Tracy all mentioned that they think zooming in and out on learning objectives with colleagues is a valuable part of the design process.

After ensuring they comprehended the learning objectives, the teachers planned checkpoints that covered all the learning objectives (five teachers) or a selection of the learning objectives (one teacher). These checkpoints were linked explicitly to the learning objectives. Tracy clearly listed the learning objectives in her formative assessment plan to continuously verify that all activities aligned with what she wanted to accomplish with her class. For Stuart, it was essential to determine which learning objectives and criteria for success were being targeted in each lesson so he could refer to them in his instructions and assignments to the students. This will make it easier for students to reflect on their learning based on these learning objectives and criteria for success because they will be present in each lesson.

Most teachers designed their formative assessment to make it easy to integrate into their existing lesson plans. For example, Tracy said: “I actually really looked at which assignments I use to get them to practice and, based on those assignments, I decided which data collection activity would fit in easily.” Patricia planned activities as not only a means to collect data on students’ learning but also as an opportunity for students to repeat and rehearse for the selected learning objectives. Stuart and Lisa added that they designed formative assessment to collect data on learning as they would design test items for the final test.

Decision-Driven Data Collection

The teachers wanted to use the information gathered at the checkpoints to make three decisions:

Can I go on to the next topic/learning objective/chapter or do I need to spend more time on the current one? (the most common decision)

How can I best differentiate in my lessons to support all students in their learning?

What do students need to rehearse/repeat/practice/learn more to be prepared for the test?

Sometimes the information gathered at one checkpoint applied to several of these decisions.

When designing decision-driven data collection, the teachers were primarily concerned with how to measure what they needed to know to inform the next formative decision. For example, Lisa planned to use a drawing instead of a question to allow every student to show their learning on the topic unimpeded by their writing skills. Stuart did not want to analyze reflection forms since they only demonstrate students’ perceptions of learning, not their progress. Likewise, Jenna did not want to analyze summaries since she does not think they illustrate what students have learned but only whether they can summarize.

Stuart added that he plans decision-driven data collection through exit tickets or mini whiteboards because this gives him more information about all students’ learning than he could acquire by walking around while students do their homework in class.

Because it is quite a pitfall that if you see something go wrong with one person, you zoom in completely on that person. Before you know it, you are working with them for five or six minutes and, supposing they have 15 minutes to work independently, then you have time to see two such students. Then my harvest is two or three students. If I base my data collection on those whiteboards and exit tickets or other formative elements, I get more of an overall picture, and I find that much more valuable.

Patricia emphasized that it is crucial to gather rich information (e.g., about existing misconceptions) instead of a count of how many questions were answered incorrectly. However, she also mentioned that it is not always evident whether a teacher collects rich data in a formative assessment plan: “I believe it is rich information because you certainly address the mistakes you saw and the corresponding underlying misconceptions. Nevertheless, that is so self-evident that it is not mentioned here.”

Data used to inform decisions at each checkpoint is collected in different ways. Most of the teachers planned to use combinations of data collection methods to get a complete overview of all students’ learning, but they did this in various ways. Some wanted to combine data collection at the checkpoints with planned or on-the-fly daily checks during the lessons before the checkpoint. For example, Stuart plans daily checks of specific lesson goals by using mini whiteboards and exit tickets in addition to formal checkpoints that reveal learning on the overarching learning objectives. Lisa planned data collection at the checkpoint together with data collection via an online tool that helps her follow how students did on their assignments in the previous lessons. This helps her gather all the information she needs to decide on the next best step after a checkpoint.

However, other teachers only collect data at the planned checkpoints, at which time they aim to gather rich information on all students by using multiple data collection methods simultaneously. Tracy, for instance, combined walking around the classroom while students worked on their assignments with gathering written peer feedback on the same assignment at her first checkpoint. At the second checkpoint, she combined online assignments with answers from mini whiteboards.

The final variation, mentioned by Bernadette and Tracy, was using checkpoints not to gather new information on learning but to bring together all relevant information on learning from all prior lessons until that moment. This can result in rich information and help teachers make formative decisions that have consequences for all students based on information from all students. For example, when they cannot let every student speak during a lesson, they combine evidence from multiple lessons to get a complete overview of students’ speech.

Room for Adjustment and Improvement

Experienced teachers recognize how difficult it is to find time to make room for adjustment and improvement in an overfull curriculum. Follow-up is the part of formative assessment that takes the most time and thus the first thing to skip when time is tight, some teachers said. However, Jenna also stated that this is the essential part, and the other teachers seemed to agree since they all have strategies to help them make room for adjustment and improvement.

Three teachers prepared different possible follow-ups in advance to help them act on the checkpoints’ outcomes immediately. Patricia, Stuart, and Bernadette prepared slides, instructions, tutorials, and/or assignments for students so they would be ready once they knew what was needed to advance students’ learning. In this preparation, they considered different possible outcomes and differences between students (e.g., they prepared assignments for students who had reached particular learning objectives and those who still needed to). Instead of only focusing on the students who did not reach a targeted learning objective, all the teachers planned room for adjustment and improvement for all students. Patricia and Tracy created common assignments for their students that would simultaneously advance the learning of students who had as well as students who had not yet reached a targeted learning objective.

Tracy and Jenna warned that when a curriculum contains too much material, teachers have less flexibility in reacting to outcomes that are different than expected. Therefore, Jenna, Patricia, and Stuart advised their fellow teachers to be aware of and look for possibilities to leave something out or create more room in the curriculum when planning their lessons. Stuart mentioned many examples of how he makes room for adjusting and improving teaching and learning. For example, he and Jenna plan checkpoints to help them decide whether they can skip (parts of) instruction. Additionally, Stuart creates plans that will help the few students who need it by assigning tutorials and extra assignments to complete at home, instead of taking a whole lesson for extra instruction and practice.

Patricia, Jenna, and Stuart also delay more advanced tasks until formative assessment shows that students are ready to take this next step. That gives them more room for adjustment and improvement on the basics at the start of a lesson plan. Jenna:

If you do not do that well, do not lay that foundation properly, you can start working on another subject soon. However, the students still have not mastered that, and next comes something that practically goes back to this, so the foundation must be set correctly.

Stuart even finds room for adjustment and improvement after the final test, since he plans room for student improvement in the successive lessons and in assignments for future lessons that take place after the chapters’ final test: “Then they understand that it is not just for now and the next test, but learning is continuous.” According to Stuart, giving students room for improvement could even mean moving a test to a later date. Bernadette, Stuart, Lisa, and Jenna also make sure students take responsibility for designing their own room for improvement by having students think about the best follow-up. Lisa also wants to give students information based on the checkpoints that they can use to make their own improvement plans.

Conclusions and Discussion

This study focused on the research question: How do experienced teachers design formative assessment plans that are aligned, include decision-driven data collection, and leave room for adjusting and improving teaching and learning? Through the interviews with experienced teachers, seven strategies were found that they used to design formative assessment plans that meet these criteria:

Starting from existing and consciously planned series of lessons.

Taking time to understand learning objectives.

Using existing lesson activities to elicit evidence of student learning instead of designing new ones.

Precisely measuring what they need to know to inform their formative decisions.

Combining various sources of information about all the students’ learning to inform their formative decisions.

Preparing different follow-ups to match different possible outcomes from formative assessment.

Purposely creating time and space for improvement and adjustment in the curriculum.

To discover the difficulties teachers can encounter in designing formative assessment plans that are aligned, include decision-driven data collection, and leave room for adjusting and improving teaching and learning, sub-question 1 asked which pitfalls can threaten the design of formative assessment plans. One frequently found pitfall was that formative assessment plans were incomplete or unclear. A formative assessment plan needs to clearly describe, explicitly link, and consciously match learning objectives, lesson activities, and assessment in line with what Biggs ( 1996 ) defined as constructive alignment. Additionally, a formative assessment plan needs to clearly describe, explicitly link, and consciously match intended formative decisions, data collection and follow-up to meet the criteria of decision-driven data collection and room for adjustment and improvement.

The other pitfalls that were found were:

A mismatch between the learning objectives, lesson activities, and/or assessment activities that resulted in less alignment;

The data planned to be collected for information about student learning was not the information that was needed to inform the corresponding formative decision and therefore was not considered decision-driven data collection;

The information on student learning gathered through the planned data collection did not become available or visible for the teacher to help them inform the corresponding formative decision and therefore is not part of decision-driven data collection; and

The planned room for adjustment and improvement was not followed by another checkpoint to establish whether it had helped to achieve the expected learning objectives and assess whether there had been enough room for adjustment and improvement.

Sub-question 2 was aimed at discovering what experienced teachers do to design formative assessment plans that are aligned, include decision-driven data collection, and leave room for adjustment and improvement. The interviewed teachers mentioned seven design strategies that help them avoid most of the pitfalls listed above and two other pitfalls that they experienced during designing formative assessment. The design strategies will be presented in more detail per criterion with the implications for improving the design steps. All implications are shown in a revised version of the design steps in Online Resource 2 . The pitfalls that these teachers did not yet find a solution for will also be presented together with the implications for future research.

Design Strategies for Achieving Alignment

The six interviewed teachers used three design strategies to ensure sufficient alignment in their formative assessment plans:

Start with existing and consciously planned series of lessons. When these are aligned with the learning objectives, there is a solid foundation for the formative assessment plan.

Take time to understand the learning objectives thoroughly. Aligning formative assessment plans requires teachers to take a good look at learning objectives, zoom in and out, transcend specifics to discover the coherence between objectives, and make them more specific for teachers and students to work with. The learning objectives are the thread that links everything together, and teachers use them during design and in their lessons to ensure and continuously check alignment.

Do not regard formative assessment as something new or extra that must be added to lesson plans. Instead, teachers should take the opportunity to re-purpose and evaluate known and existing lesson activities in their lesson plans to investigate the possibilities of using them as activities to elicit evidence about students’ learning as part of a formative assessment plan.

These findings have implications for design steps 1, 3, and 5:

Step 1: Describe the context. Based on these outcomes regarding alignment, the first design step should suggest teachers to start their design with existing series of lessons that have already been designed, checked, implemented, and evaluated with alignment in mind. Additionally, it should emphasize that teachers should not only describe the learning objectives but take time to thoroughly understand them by zooming in and out to look for overarching learning objectives to bring everything together.

Step 3: Design rich data collection. To contribute to alignment design step three should include the suggestion to re-purpose learning and assessment tasks that are already part of the existing series of lessons and use them formatively instead of adding on formative assessment.

Step 5: Take a helicopter view. This step is now situated at the end of the design process. However, according to the teachers, this step encompasses the total design process since they continuously check for alignment during the design and execution of the formative assessment plan.

Design Strategies for Decision-Driven Data Collection

The experienced teachers used two design strategies to design decision-driven data collection (Wiliam, 2013 , 2014 ):

Ensure to precisely measure what you need to know by matching the form and content of the data collection methods to the formative decision the teacher wants to make.

Combine various information on the learning of all students to inform formative decisions solidly. The teachers reported three approaches they designed to help them do this:

Teachers combined information from different formative checks — planned or on the fly from prior lessons — with data collection conducted up to and around the checkpoints to inform their formative decisions.

Teachers collected data only at the planned checkpoints, at which time they aim to gather rich information on all students by using multiple data collection methods simultaneously.

Teachers did not use checkpoints to gather new information on learning but used them to bring together all relevant information on learning from all prior lessons until that moment.

If data is consciously collected to match and inform the formative decisions, all these variations can be considered decision-driven data collection. The important overarching design strategy is that teachers combine information from multiple data collections to inform their formative decisions.

As for decision-driven data collection, the two strategies experienced teachers mentioned are already incorporated in the design steps. However, the importance of these strategies can be emphasized in design step 3 and 5 based on this study:

Step 3: Design rich data collection. To emphasize the collection of rich data, this step could recommend combining multiple data collection methods positioned at the checkpoint and/or in the lessons before the checkpoint to get an honest, reliable, and complete overview of student learning.

Step 5: Take a helicopter view. Since design step 3 already mentions that teachers should consider if the data collection measures what they need to know, this could be emphasized by including this as an extra check in design step 5.

Design Strategies for Creating Room for Adjustment and Improvement

Most of the experienced teachers in this study acknowledged the difficulty of creating room for adjustment and improvement (Veugen et al., 2021 ). As a result, the three design strategies the teachers used to create room for adjustment and improvement focus on increasing the possibility to use the outcomes of the checkpoints:

Ensure to prepare various follow-ups to ensure teachers always have assignments, tutorials, and instructions ready to go and suitable for different outcomes and students.

Consciously create space and time in the curriculum for adjustment and improvement. The teachers reported three approaches that helped them do this:

Teachers planned checkpoints with data collection to help them decide whether they can skip, speed up, or delay the curriculum to create room for adjustment and improvement.

Teachers planned to delay more advanced tasks until they were sure students had acquired the basics

Teachers planned space for student improvement outside the classroom or after the final test.

Most teachers thought it was also important that students learned to make their own improvement plans and become co-responsible for the learning process.

As for the design steps, the design strategies for creating room for adjustment and improvement may lead to changes in design steps 4 and 5:

Step 4 . Make room for adjustment and improvement. T eachers might find it helpful to prepare follow-ups in advance for multiple possible outcomes and all students at each checkpoint. This design step could incorporate this suggestion

Step 5 : Take a helicopter view. In this step, teachers could be encouraged to think about risks that could hinder the creation of room for adjustment and improvement after a checkpoint and how to prevent this or prepare an alternate plan.

Other Pitfalls

Apart from the design strategies, the interviews also disclosed that even experienced teachers face difficulties in designing formative assessment. The two pitfalls that were mentioned can make it hard for teachers to design and implement formative assessment plans that are aligned, include decision-driven data collection, and make room for adjustment and improvement despite of the strategies they already use.

Narrow Formative Decisions from the Foundation for Decision-Driven Data Collection

The teachers reported that three formative decisions at the checkpoints were the foundation for their formative data collection. These decisions were limited to: “Can I go on with the next lesson/chapter/learning objective or is something else needed?” However, to ensure decision-driven data collection contributes to creating the best suitable follow-up for students, the decisions should go beyond “Can I go on or not” and include “What is the best way to move forward?” Therefore, the decisions that are the foundation of decision-driven data collection should have a double focus.

The first focus refers to the “yes or no” decision (e.g., “Can I go on to the next chapter?” or “Is it necessary to differentiate?”). There, teachers need to add a “how to continue” decision (e.g., “What is the best way to go forward?” or “How should I differentiate in the next lesson?”, respectively). When teachers incorporate both in their formative decisions, it is more likely that decision-driven data collection will provide the information needed to choose the follow-up that best suits students’ needs.

The second focus involves collaboration. Teachers can use conversations with their students to discover the best suitable follow-up (Allal, 2020 ). Formative assessment will achieve its full potential when it is perceived as support for development and learning and discovering how to suit students’ needs best rather than solely used for control and accountability and solely focused on the go or no go (Ninomiya & Shuichi, 2016 ).

Limited Room for Improvement

While one teacher advocated for perceiving learning and formative assessment as a continuous process that transcends specific chapters or series of lessons, the other teachers often wondered “How much room for improvement can I give my students?” or “How can I justify continuous and differentiated learning processes that suit students’ needs and still achieve all the learning objectives with all my students within a specific time?”

The differences between these teachers and the choices they make can be explained in several ways. The teachers’ pedagogical foundation and their knowledge and skills regarding formative assessment can play a role, as can the amount of space they perceive to define and shape their work within the school context (i.e., their professional agency) (Heitink et al., 2016 ; Oolbekkink-Marchand et al., 2022 ). These differences can be used to explain why one experienced teacher creates room for improvement during a new chapter or delays a test, while others wonder how much room for improvement they can give their students before they must move forward to the next subject, lesson, or learning objective. The amount of space and freedom teachers experience to let formative assessment lead their teaching decisions and the role teachers’ agency and experience play in this process would be interesting subjects for future research.

Limitations and Implications for Future Research

A first limitation of this study was that the conclusions about sub-question 1 are based on formative assessment plans that were only a reality on paper. The outcomes of the analysis were based on the complete plans and the possible risks perceived in the incomplete plans. Since the formative assessment plans are merely a written plan rather than a reflection of action, they do not always show what a teacher really planned to do. During the interviews, it became clear that when teachers explained their formative assessment plans, a lot of information about them had not been written down but was still essential to comprehend what they planned to do. Since only six teachers were interviewed, it is possible that asking more teachers to explain their formative assessment plans might have led to fewer, more, or different pitfalls in designing formative assessment.

Another limitation of this study was the inclusion of only six experienced teachers, mainly from theoretical subjects. It is unclear whether less experienced teachers or teachers of more practical subjects would use the same or other design strategies to meet the three criteria. Future research could focus on collecting a range of design stories from experienced and new teachers, with or without prior knowledge and skills concerning formative assessment, of both theoretical and practical subjects.

In conclusion, it is interesting to note a paradox about formative assessment. The current study showed that when teachers consciously prepare, plan, and match their formative assessment in advance, this helps them to achieve alignment, decision-driven data collection, and room for adjustment and improvement. However, this contradicts Black and Wiliam’s ( 2009 ) definition, which emphasizes that formative assessment “is concerned with the creation of, and capitalization upon, ‘moments of contingency’” (p. 10). When formative assessment is extensively planned, are teachers still able to pick up on the element of surprise? Decision-driven data collection can present them with different outcomes than expected, so an unforeseen follow-up may be needed to best suit students’ needs. How can teachers keep an inquiring and open mind when they collect and analyze evidence of learning after they planned their formative decisions, data collection, and follow-ups in detail in advance?

Thus, it would be interesting for future research to discover how teachers use information about students’ learning to inform their decisions. Do they analyze this information with enough openness and curiosity to really discern students’ needs and adjust their planned follow-up based on unexpected outcomes?

Data Availability

The data that support the findings of this study are available from the corresponding author, [JS], upon reasonable request. The data will not become publicly available before the ending of the research project this study is connected to.

Allal, L. (2020). Assessment and the co-regulation of learning in the classroom. Assessment in Education: Principles, Policy & Practice, 27 (4), 332–349. https://doi.org/10.1080/0969594X.2019.1609411

Article   Google Scholar  

Antoniou, P., & James, M. (2014). Exploring formative assessment in primary school classrooms: Developing a framework of actions and strategies. Educational Assessment Evaluation and Accountability, 26 (2), 153–176. https://doi.org/10.1007/s11092-013-918

Biggs, J. (1996). Enhancing teaching through constructive alignment. Higher Education , 32 (3), 347–364. https://doi.org/10.1007/bf00138871

Black, P., & Wiliam, D. (2009). Developing the theory of formative assessment. Educational Assessment, Evaluation and Accountability, 21 (1), 5–31. https://doi.org/10.1007/s11092-008-9068-5

Brooks, J., McCluskey, S., Turley, E., & King, N. (2015). The utility of template analysis in qualitative psychology research. Qualitative Research in Psychology, 12 (2), 202–222. https://doi.org/10.1080/14780887.2014.955224

Furtak, E. M., Circi, R., & Heredia, S. C. (2018). Exploring alignment among learning progressions, teacher-designed formative assessment tasks, and student growth: Results of a four-year study. Applied Measurement in Education , 31 (2), 143–156. https://doi.org/10.1080/08957347.2017.1408624

Gulikers, J., Biemans, H., Wesselink, R., & van der Wel, M. (2013). Aligning formative and summative assessments: A collaborative action research challenging teacher conceptions. Studies in Educational Evaluation, 39 (2), 116–124. https://doi.org/10.1016/j.stueduc.2013.03.001

Heitink, M. C., Van der Kleij, F. M., Veldkamp, B. P., Schildkamp, K., & Kippers, W. B. (2016). A systematic review of prerequisites for implementing assessment for learning in classroom practice. Educational Research Review, 17 , 50–62. https://doi.org/10.1016/j.edurev.2015.12.002

Ninomiya, S. (2016). The possibilities and limitations of assessment for learning: Exploring the theory of formative assessment and the notion of “Closing the learning gap”. Educational Studies in Japan: International Yearbook, 10 79-91.

Offerdahl, E. G., McConnell, M., & Boyer, J. (2018). Can I have your recipe? Using a fidelity of implementation (FOI) framework to identify the key ingredients of formative assessment for learning. CBE Life Sciences Education, 17 (16), 1–9. https://doi.org/10.1187/cbe.18-02-0029

Oolbekkink-Marchand, H., van der Want, A., Schaap, H., Louws, M., & Meijer, P. (2022). Achieving professional agency for school development in the context of having a PhD scholarship: An intricate interplay. Teaching and Teacher Education, 113 , 103684. https://doi.org/10.1016/J.TATE.2022.103684

Ruiz-Primo, M. A., & Furtak, E. M. (2007). Exploring teachers’ informal formative assessment practices and students’ understanding in the context of scientific inquiry. Journal of Research in Science Teaching, 44 (1), 57–84. https://doi.org/10.1002/tea.20163

Van Den Berg, M. (2018). Classroom formative assessment: A quest for a practice that enhances students’ mathematics performance . [Doctoral dissertation, University of Groningen]. University of Groningen Research Outputs. https://research.rug.nl/en/publications/classroom-formative-assessment-a-quest-for-a-practice-that-enhanc

van der Steen, J., van Schilt-Mol, T., van der Vleuten, C., & Joosten-ten Brinke, D. (2022). Supporting teachers in improving formative decision-making: Design principles for formative assessment plans. Frontiers in Education , 7 , 925352. https://doi.org/10.3389/feduc.2022.925352

Van Es, E. A. (2012). Examining the development of a teacher learning community: The case of a video club. Teaching and Teacher Education, 28 , 182–192. https://doi.org/10.1016/j.tate.2011.09.005

Veugen, M. J., Gulikers, J. T. M., & Den Brok, P. (2021). We agree on what we see: Teacher and student perceptions of formative assessment practice. Studies in Educational Evaluation, 70 , 101027. https://doi.org/10.1016/j.stueduc.2021.101027

Wiliam, D. (2013). Assessment: The bridge between teaching and learning. Voices from the Middle , 21 (2), 15–20. https://library.ncte.org/journals/vm/issues/v21-2/24461

Wiliam, D. (2014). Formative assessment and contingency in the regulation of learning processes. Paper presented at the annual meeting of the American Educational Research Association, Philadelphia, PA. http://dylanwiliam.org/Dylan_Wiliams_website/Papers_files/Formative%20assessment%20and%20contingency%20in%20the%20regulation%20of%20learning%20processes%20%28AERA%202014%29.docx

Wylie, E. C., & Lyon, C. J. (2015). The fidelity of formative assessment implementation: Issues of breadth and quality. Assessment in Education: Principles, Policy & Practice, 22 (1), 140–160. https://doi.org/10.1080/0969594X.2014.990416

Download references

Acknowledgements

We thank all who contributed to completing this study. We give special consideration to the schools who are our committed partners in learning about and designing formative assessment plans and the teachers who took the time to share their formative assessment plans and design stories.

This work was supported by the Taskforce for Applied Research SIA, or Regieorgaan SIA (grant number RAAK.PRO03.057).

Author information

Authors and affiliations.

School of Health Professions Education, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, the Netherlands

Janneke van der Steen & Cees van der Vleuten

Research Centre Qualities of Teachers, School of Education, HAN University of Applied Sciences, Nijmegen, the Netherlands

Janneke van der Steen & Tamara van Schilt-Mol

Department of Online Learning and Instruction, Faculty of Educational Sciences, Open Universiteit, Heerlen, the Netherlands

Desirée Joosten-ten Brinke

You can also search for this author in PubMed   Google Scholar

Contributions

Janneke van der Steen: conceptualization, methodology, validation, formal analysis and investigation, data curation, writing — original draft preparation, visualization. Tamara van Schilt-Mol: conceptualization, methodology, supervision, validation, writing — review and editing, project administration, funding acquisition. Cees van der Vleuten: conceptualization, supervision, validation, writing — review and editing. Desirée Joosten-ten Brinke: conceptualization, supervision, validation, writing — review and editing.

Corresponding author

Correspondence to Janneke van der Steen .

Ethics declarations

Ethical approvement and consent.

Approval was obtained from the ethics committee of HAN University of Applied Sciences. The procedures used in this study adhere to the tenets of the Declaration of Helsinki. Informed consent was obtained from all individual participants in the study.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 228 KB)

Supplementary file2 (docx 1.15 mb), rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

van der Steen, J., van Schilt-Mol, T., van der Vleuten, C. et al. Designing Formative Assessment That Improves Teaching and Learning: What Can Be Learned from the Design Stories of Experienced Teachers?. J Form Des Learn 7 , 182–194 (2023). https://doi.org/10.1007/s41686-023-00080-w

Download citation

Accepted : 28 September 2023

Published : 17 October 2023

Issue Date : December 2023

DOI : https://doi.org/10.1007/s41686-023-00080-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Assessment for learning
  • Decision-driven data collection
  • Constructive alignment
  • Formative assessment plan
  • Educational design
  • Teacher learning
  • Find a journal
  • Publish with us
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Psychol

The effectiveness of formative assessment for enhancing reading achievement in K-12 classrooms: A meta-analysis

Associated data.

This quantitative synthesis included 48 qualified studies with a total sample of 116,051 K-12 students. Aligned with previous meta-analyses, the findings suggested that formative assessment generally had a positive though modest effect (ES = + 0.19) on students’ reading achievement. Meta-regression results revealed that: (a) studies with 250 or less students yielded significantly larger effect size than large sample studies, (b) the effects of formative assessment embedded with differentiated instruction equated to an increase of 0.13 SD in the reading achievement score, (c) integration of teacher and student directed assessment was more effective than assessments initiated by teachers. Our subgroup analysis data indicated that the effect sizes of formative assessment intervention on reading were significantly different between Confucian-heritage culture and Anglophone culture and had divergent effective features. The result cautions against the generalization of formative assessment across different cultures without adaptation. We suggest that effect sizes could be calculated and intervention features be investigated in various cultural settings for practitioners and policymakers to implement tailored formative assessment.

Introduction

In an era of reconfiguring the relationship between learning and assessment, spurred by quantitative and qualitative evidence, formative assessment is proffered to meet the goals of lifelong learning and promote high-performance and high equity for all students ( OECD, 2008 ). It has gained momentum among researchers and practitioners in various culture contexts. In an oft-cited ‘configurative review’ ( Sandelowski et al., 2012 ) on formative assessment, Black and Wiliam (1998) reported that effect sizes of formative assessment of student achievements were between 0.4 and 0.7 ranging over age groups from 5-year-olds to university undergraduates. The impact of teachers’ formative evaluation on student achievement was ranked third with an effect size of 0.9 in 138 learning activities influencing student achievement ( Hattie, 2009 ). Also, feedback, as an essential part of formative assessment, has been found to positively enhance students’ learning ( Hattie and Timperley, 2007 ; Hattie, 2009 ; Wisniewski et al., 2019 ). The large prima facie effect sizes found to raise the standards of learning laid a foundation for future evidence-based assessment policy reform. Formative assessment has gained an ever-widening array of attentions in various countries and regions.

In the past three decades, only four comprehensive reviews have reported the positive effect sizes of formative assessment on reading achievement which ranged from +0.22 to +0.7 ( Fuchs and Fuchs, 1986 ; Black and Wiliam, 1998 ; Kingston and Nash, 2011 ; Klute et al., 2017 ). Yet in a literature review of 15 studies commissioned by the Australian Institute of Teaching and School Leadership (AITSL), the researchers stated that the impact of formative assessment on reading achievement was discouraging due to no effective tools could be identified and some programs integrated with technologies ( Lane et al., 2019 ). The interpretations from the prior meta-analyses and literature review seems to be conflicting. Different school subjects require domain-specific effective formative assessment interventions ( Wiliam, 2011 ). Arguably, whether how formative assessment enhances students’ reading achievement remains unclear, this problem can be addressed by an updated and comprehensive meta-analysis. Lane et al. (2019) concerned that it could not distinguish the effect of formative assessment from digital technology on reading if they were mixed in a program. This issue can be settled by setting the involvement of digital technology in formative assessment practices as a moderator, which compares the programs with or without technology. Given the importance of formative assessment and the need for further statistical evidence on the reading subject ( Clark, 2010 ; Van der Kleij et al., 2017 ; Black and Wiliam, 2018 ; Andrade et al., 2019 ), the purpose of this review, included literature in English and Chinese up to 2021, is to assess evidence from rigorous evaluations to determine the magnitude of experiment effects of formative assessment on students’ reading performance and identify what features influenced its effectiveness. Noticeably, performing international comparison of formative assessment practices requires culture sensitivity ( Shimojima and Arimoto, 2017 ). In this meta-analysis, we set three factors suggested by Cheung et al. (2021) to frame the features of formative assessment: substantive factors (student characteristics, grade level, type of intervention, digital technology, program duration, differentiated instruction), methodological factors (sample size, research design), and other factors (publication type, cultural setting).

Working definition of formative assessment

Since the term formative assessment has been used widely and diversely in the literature and because its classroom practice can vary within different educational settings, it is important to provide a working definition of the term to guide this review. Given the nebulous nature of formative assessment, a working definition of formative assessment is proposed based on the prior definitions in the past three decades. The essential statements of the 19 definitions (shown in Supplementary material ) were compiled aligned with a succinct framework. Jönsson (2020) suggested that definition of formative assessment should include evaluative judgment (qualitative judgment) occurring in daily teacher-student interactions and a psychometric understanding of assessment depending on aggregating evidence of student learning collected by teachers. To follow this advice and identify potential studies, this review culls the more comprehensive descriptions under each element of the suggested definitions. Formative assessment in this review is broadly defined as

an active and intentional process with formal and informal classroom practices/activities harvesting evidence of students’ learning progress by evaluative/qualitative judgment and a psychometric understanding of various assessments (what) during teaching and learning (when), in which teachers (who) continuously and systematically elicit, interpret and use evidence about students’ learning and conceptual organization (how) to guide their pedagogical plans (why), and/or students (who) work with/without teachers or peers to adjust their current learning tactics (how) with an effort to improve their achievements and self-regulate their learning (why) ( Popham, 2008 ; Black and Wiliam, 2009 ; Chappius, 2009 ; Moss and Brookhart, 2009 ; Cizek et al., 2019 ).

The evaluative judgment refers to the daily teacher- student interactions eliciting evidence about learners’ progress, for instance, feedback, discussions, presentations, and other students’ artifacts. Psychometric assessments entail some quizzes, tests or indirect measurement that necessitates interpretation of outcomes ( Jönsson, 2020 ). In this sense, some formative utilities of benchmark assessments and summative assessments ( Wiliam, 2011 ) would be included if they met all the selection criteria in this review. Considering that formative assessments are classroom practices to identify students’ learning gaps and improve their learning, participants can be teachers, students or their peers, as well as the integration of teachers and students. This review would clarify types of intervention to compare the effectiveness of different participants’ engagement.

Alternative terminologies have emanated from different emphases to serve a common underlying formative purpose ( Kingston and Nash, 2011 ). It is worth mentioning, the term assessment for learning (AfL) is often used interchangeably with formative assessment to emphasize the function of formative assessment to improve student learning ( Heritage, 2010 ; Bennett, 2011 ). The term AfL, first used by Harry Black ( Black and Wiliam, 1986 ), was advocated by the Assessment Reform Group (ARG) in United Kingdom. Another term assessment as learning (AaL), was also phrased to signal the active role students play in the formative assessment process ( Earl, 2012 ). Regarding assessment for, as, and of learning, each delineates the purpose for which the assessment was carried out. Differently, formative assessment and summative assessment are clarified by the functions they actually serve ( Wiliam, 2011 ). Bennett (2011) suggested that it was not instructive to equate AfL with formative assessment and assessment of learning with summative assessment. However, a thorough exploration of the nuances between these two distinctions is beyond the scope of this paper. To include potentially qualified studies as broadly as possible, albeit labeled by alternative terms of assessment, the terms “formative evaluation,” “feedback,” “AfL,” and “assessment as learning,” were used as the key words in this review.

Previous reviews of formative assessment on reading achievement

From the literature review, eight major reviews on formative assessment were found in this area ( Fuchs and Fuchs, 1986 ; Kluger and DeNisi, 1996 ; Black and Wiliam, 1998 ; Hattie, 2009 ; Kingston and Nash, 2011 ; Heitink et al., 2016 ; Klute et al., 2017 ; Sanchez et al., 2017 ). However, only four out of the eight comprehensive reviews encompassed the effect sizes of formative assessment in reading achievement ( Fuchs and Fuchs, 1986 ; Black and Wiliam, 1998 ; Kingston and Nash, 2011 ; Klute et al., 2017 ). These four reviews indicated the positive effects of formative assessment on reading achievement, with effect sizes that ranged widely, from +0.22 to +0.7 ( Table 1 ).

Summary of major meta-analysis on effects of formative assessment on reading achievement.

Fuchs and Fuchs ’ ( 1986 ) generated 96 effect sizes from 21 controlled studies, with an average weighted effect size of +0.70. The authors described that 8 of the 21 investigations focused solely on reading, 4 on reading and math, and 1 on reading, math and spelling, with no specific effect size calculated for reading. This meta-analysis focused upon special education as 83% of the 3,835 investigated subjects belonged to the special educational needs (SEN) population. It is inappropriate to generalize the findings to population of students at large. Secondly, 96 effect sizes generated from the 21 controlled studies were derived from analyses of divergent quality as the authors acknowledged. 69 effect sizes were of fair quality and 8 of poor quality which accounted for around 80% of all the effect sizes. Thus, the average effect size of 0.70 from the 21 studies examined was from research that was methodologically unsound ( Dunn and Mulvenon, 2009 ). The limitation of specialized sample groups and the quality of the studies reviewed cast doubts on the validity of the large effect size.

Black and Wiliam’s (1998) review of more than 250 articles related to formative assessment was a seminal piece to prove the positive effects of formative assessment on student achievement. The authors presented eight articles to support their conclusions pertaining to the efficacy of formative assessment without performing any quantitative meta-analysis techniques. The effect size that ranged from 0.40 to 0.70 concluded from their analysis was equivocal and inadequate to be applied in different contexts ( Dunn and Mulvenon, 2009 ; Kingston and Nash, 2011 ). This review did not clarify the subject-based effect sizes. Hence, no substantiated effect sizes on reading achievement could be retrieved. Nevertheless, this ‘configurative review’ ( Sandelowski et al., 2012 ) did encourage more widespread empirical research in the area of formative assessment ( Black and Wiliam, 2018 ).

Kington and Nash (2011) screened out 13 of over 300 studies in grades K-12 to reexamine the effects between 0.40 and 0.70. Their moderator analyses indicated that the effect size of formative assessment in English language arts (ES = + 0.32) was larger than those in mathematics (ES = + 0.17) and science (ES = + 0.09). Briggs et al. (2012) commented that one of the marked flaws that threatened Kingston and Nash’s conclusion was their study retrieval and selection approach. It might explain the paucity of Kingston and Nash’s research base ( Kingston and Nash, 2012 ). This problem could be solved by referring to some subset of the studies suggested by Black and Wiliam (1998) .

The latest meta-analysis involving reading achievement was conducted by the US Department of Education ( Klute et al., 2017 ). The research team identified 23 rigorous studies on reading, math and writing in elementary level to demonstrate the positive effects of formative assessment interventions on student outcomes from 1988 to 2014. Though it was stated that the review identified studies published between 1988 and 2014, their finalized list for reading only updated to 2007. Of the 23 studies in various subject areas, nine focused on reading with an average effect size of +0.22. Interestingly, their report revealed that other-directed formative assessment was more effective (ES = + 0.41) than student-directed formative assessment (ES = –0.15). Other-directed formative assessment encompassed educators or computer software programs, whilst, student-directed formative assessment referred to self-assessment, self-regulation and peer assessment. The novice categories of formative assessment provided new insights into the moderator analyses. This report was rigorous with stringent controls on selection criteria. However, it solely covered the elementary level and restricted the geographical research location in Anglophone countries.

Moderator variables

To warrant the quality of a meta-analysis, a rationale for the coding schema should be provided ( Pigott and Polanin, 2019 ). Three factors suggested by Cheung et al. (2021) were set to frame the features of formative assessment.

Methodological factors

Methodological factors describe research design and sample size. One possible factor that might cause variance is the research design of divergent studies ( Abrami and Bernard, 2006 ). Two groups of research designs were identified in this review: RCT (Randomized Control Trial) and QED (Quasi-experimental design). Particularly of concern is that cluster (school-level, classroom-level, and teacher-level) randomized control trial with student-level outcome measure would be coded as quasi-experimental studies. Another potential source of variation may lie in the sample size which was reported to be negatively correlated with effect sizes in studies of reading program ( Slavin and Smith, 2009 ). Following the tradition in a previous meta-analysis ( Cheung and Slavin, 2016 ), this review coded studies with 250 students or less as small sample, the others were taken as large sample.

Substantive factors

Substantive factors depict the background of a study such as population, context and duration. Six program features identified from some seminal meta-analyses on reading and formative assessment ( Klute et al., 2017 ) were included in this review.

Student characteristics, grade level and program duration

Students in the included studies were categorized into at-risk or mainstream students. At-risk students referred to students who had reading difficulties or of low performance in common classrooms, others were coded as mainstream students. Grade level was divided into kindergarten, elementary and middle/high levels. Program duration set 1 year as a threshold to classify long and short programs. Programs that lasted for less than 1 year were coded as short; the rest were long.

Differentiated instruction

Formative assessment is a “gap minder” ( Roskos and Neuman, 2012 ) enabling teachers and students to identify the gap between where students are and where they need to go in their reading development ( Wiliam and Thompson, 2007 ). Consequently, teachers can stay alert to these gaps and differentiate their instruction to various students. Differentiated instruction is taken as an optional component in the formative assessment practice. In our review, there were several teacher’s practices that were coded as “without interventions.” For instance, teachers who kept track of students’ learning gaps without changing their teaching plan, or who just monitored the interim/benchmark assessment results without further action on differentiating or individualizing their teaching to different students aligned with the data from assessment.

Type of intervention

Tethered to main sources of formative assessment practices ( Andrade et al., 2019 ) in two latest integrated formative assessment meta-analyses ( Klute et al., 2017 ; Lee et al., 2020 ), type of intervention was coded as teacher-directed, student-directed or integrated (teacher and student assessment). Specifically, teacher-directed assessment referred to teachers who provided feedback, interim/benchmark assessment or other resources to gauge students’ learning, be it computer-based or paper-based, and/or conducted individualizing or differentiating instruction to students’ classroom learning. Student-directed assessment mainly manifested in the forms of peer- or self- assessment, and young learners’ meaning-focused group reading activity ( Connor et al., 2009 ). Integrated practices involved both teacher and student in the assessment process.

Digital technology

Various digital technologies have been explored and applied in K-12 formative assessment practice in the 21st century ( Spector et al., 2016 ). A newly published article suggested digital technology could be conducive to reading for young children not but for older children ( See et al., 2021 ). This review will cross check this result by including more rigorous studies on reading from various cultural contexts.

Other factors

Other factors are the external variables that might influence the variance of effect sizes. We included publication type and cultural settings which were never assessed in previous meta-analyses on formative assessment.

Publication type

Validity of the results from a meta-analysis is often reported to be threatened by the presence of publication bias. To put it succinctly, publication bias refers to studies with large or statistically significant effects compared to studies with small or null effects being prone to publication. This meta-analysis included both published and unpublished literature (technical reports, dissertations and conference reports).

Cultural settings

Formative assessment was introduced and developed in Anglophone culture represented by United Kingdom and United States. In light of previous reviewed policies in Asia-Pacific regions, it is safe to assume that formative assessment has been introduced and implemented in Asia, especially in countries or regions heavily influenced by Confucian-heritage culture (CHC) which was heavily influenced by exam-orientation ( Biggs, 1998 ). Teachers from CHC culture are often burdened with high-stake test pressure. It might be more demanding for teachers in CHC classrooms to believe that formative assessment is to facilitate learning rather than accredit it ( Crossouard and Pryor, 2012 ). This review, as a first of its kind, attempted to compare the interventions in Anglophone and Confucian-heritage culture. Studies conducted in Anglophone culture are from Barbados (1), Germany (3), Spain (1), Sweden (1), United Kingdom (1) and United States (30), while studies in CHC settings are from Hong Kong (4), South Korea (1) and Taiwan (3). To note, although Germany, Spain and Sweden are not English-speaking countries, we still categorized them into Anglophone culture in stark contrast to the exam-driven CHC. Surprisingly, few studies from Mainland China could be located to meet our inclusion criteria. The reasons for this were threefold. First, some marginally qualified studies were carried out by only one teacher in two classes so the teacher effect could not be evened out. Second, some studies did not report results of reading achievement as they were not statistically significant, which was explicitly stated by the authors. Besides, the majority of formative assessment projects in China were based in higher education. The culling process implied some new directions for future research and reviews.

Methodological and other factors are mainly the extrinsic factors that can be applied to meta-analyses in other research fields. The substantive factors include intrinsic features that are commonly seen in formative assessment activities. These moderators provide a comparatively holistic set of features that might influence the effect of formative assessment on students’ reading achievement.

Rationale for present review

Due to the paucity of studies, lack of stringent selection criteria and limitation of samples, the aforementioned comprehensive reviews encouraged more rigorous studies to be investigated to reveal the latest effect size for the subject–reading. The existing subject-based reviews have covered mathematics ( Gersten et al., 2009 ; Burns et al., 2010 ; Wang et al., 2016 ; Kingston and Broaddus, 2017 ), writing ( Graham et al., 2015 ; Miller et al., 2018 ), and science ( Hartmeyer et al., 2018 ), but not reading.

To provide a more comprehensive understanding of the effectiveness of formative assessment for enhancing reading achievement, this study attempted to elicit exemplary formative assessment practices by applying rigorous, consistent inclusion criteria to identify high-quality studies. Our review, in an effort to sketch a comprehensive picture of the effects of formative assessment on reading, statistically consolidated the effect sizes of qualified studies in terms of methodological and substantive features. The present study attempts to address two research questions:

  • (1) What is the effect size of formative assessment on K-12 reading programs?
  • (2) What study and research features moderate the effects of formative assessment interventions on student reading achievement?

The present review employed meta-analytic techniques suggested by Glass et al. (1981) and Lipsey and Wilson (2001) . Comprehensive Meta-analysis Software Version 3.0 ( Borenstein et al., 2013 ) was adopted to compute effect sizes and to carry out various meta-analytical tests. The following steps were taken during meta-analytic procedures: (1) scan potential studies for inclusion using preset criteria; (2) Locate all possible studies; (3) code all qualified studies based on their methodological and substantive features; (4) calculate effect sizes for all selected studies for additional combined analyses; (5) perform comprehensive statistical analyses encompassing both average effects and the relationships between effects and study features.

Criteria for inclusion

To be included in this review, the following inclusion criteria were preset.

  • (1) Studies that examined the effects of formative assessment or AfL on students’ reading outcomes.
  • (2) Studies can be directed by a single party, be it teacher or student (peer- or self- assessment), or by collaboration of teachers and students.
  • (3) Classroom practices align with the definition of formative assessment in this review.
  • (4) The studies involved students in kindergarten, elementary and secondary education.
  • (5) Reading programs included English as a native or a foreign language in their reading courses, or reading courses in students’ mother tongue.
  • (6) Studies could have taken place in any country or region, but the report had to be available in English or Chinese.
  • (7) Treatment/experiment group(s) embedded with formative assessment activities was/were compared with control group(s) using standard/traditional methods (aka business-as-usual groups).
  • (8) Pretest data had to be provided ( What Works Clearinghouse, 2020 ), unless studies used random assignment of at least 30 units (individuals, classes, or schools) and no indications of initial inequality were reported, which were set aligned with ESSA (Every Student Succeeds Act) evidence standards ( ESSA, 2015 ). Studies with pretest differences of more than 50% of a standard deviation were excluded because, large pretest differences could not be adequately managed as underlying distributions may be fundamentally different even with analyses of covariance ( Shadish et al., 2002 ).
  • (9) Two teachers (each in one classroom) should be involved in each treatment group to even out the teacher effect in treatment effects. Of note, some studies which only examined the students’ roles in formative assessment with only one teacher in each group were included.
  • (10) Studies interventions had to be replicable in realistic school settings (i.e., in usual classroom setting, students with their usual teacher, controlled experiments). Studies equipping experimental groups with extraordinary amounts of aids (e.g., additional staff to ensure proper implementation) where the Hawthorn effect would be generated were excluded.

Literature search procedures

All qualified studies from the current review come from three main sources. (1) Previous reviews; Analyzed studies from the previous reviews were further examined. (2) Electronic searches; A comprehensive literature search of articles written up to 2021 was conducted to screen out qualifying studies. Electronic searches were carried out through educational databases (e.g., ERIC, EBSCO, JSTOR, Psych INFO, ScienceDirect, Scopus, Dissertation Abstracts, ProQuest, WorldCat, CNKI), web-based repositories (e.g., Google, Google Scholar), and gray literature databases (e.g., OpenGrey, OpenDOAR). The key words for the search included ‘formative assessment,’ ‘formative evaluation,’ ‘feedback,’ ‘assessment for learning,’ ‘assessment as learning,’ ‘curriculum-based assessment,’ ‘differentiated instruction,’ ‘portfolio assessment,’ ‘performance assessment,’ ‘process assessment,’ ‘progress monitoring,’ ‘response to intervention’ ( Gersten et al., 2020 ), as well as the subset forms under the formative assessment umbrella suggested by Klute et al. (2017) (e.g., self-monitoring, self-assessment, self-direct, peer assessment). (3) Relevant contextualized assessments. The following contextualized assessment projects and systems were included in the searching procedure: learning-oriented assessment ( Carless, 2007 ), A2i (Assessment to instruction) ( Connor et al., 2007 ), SLOA (Self-directed Learning Oriented Assessment) ( Mok, 2012 ), LPA (learning progress assessment) ( Förster and Souvignier, 2014 ), DIALANG (Diagnostic Language Assessment) ( Zhang and Thompson, 2004 ) and CoDiAs (Cognitive Diagnostic Assessment System) ( Leighton and Gierl, 2007 ).

Articles found in the databases were primarily screened by the lead author at the title and abstract level if the purpose of the study matched the independent (formative assessment intervention program) and dependent (reading outcome) variables guiding this meta-analysis. Records identified through database searching numbered 8,048. Additionally, 21 studies were found from previous meta-analysis ( Kingston and Nash, 2011 ; Klute et al., 2017 ) and a literature review ( Lane et al., 2019 ). Seven studies were included from two formative assessment projects: A2i (Assessment to instruction) ( Connor et al., 2007 , 2011 , 2013 ; Al Otaiba et al., 2011 ) and LPA (learning progress assessment) ( Förster and Souvignier, 2014 , 2015 ; Förster et al., 2018 ; Peters et al., 2021 ). The screening of titles resulted in the retention of 8076 articles at the title and abstract levels that were further examined for eligibility and inclusion in this study. In the first round of screening, we mainly parsed out studies that were not experiments and irrelevant to reading. Then, 113 articles were retained for full-text examination. By applying the inclusion criteria in this review, full-text articles were excluded for the following reasons: without a control group (e.g., Topping and Fisher, 2003 ), no pre-test (e.g., Cain, 2015 ), with over 0.50 SD in pre-test (e.g., Hall et al., 2014 ), only focusing on spelling or vocabulary (e.g., Faber and Visscher, 2018 ), without reading achievement (e.g., Marcotte and Hintze, 2009 ), with sample size less than 30 participants (e.g., Chen et al., 2021 ), students in special education (e.g., Fuchs et al., 1992 ), and at tertiary level (e.g., Palmer and Devitt, 2014 ). The numbers in each category can be seen in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is fpsyg-13-990196-g001.jpg

PRIMA flow chart ( Moher et al., 2009 ).

Coding scheme

To assess the relationship between effects and studies’ methodological and substantive features, studies were coded. Methodological features referred to research design and sample size. Substantive features entailed types of publication, grade levels, types of intervention, program duration, implementation, cultural settings, year of publication, students’ characteristics, online technology. The study features were categorized as follows:

  • (1) Students’ characteristics: Mainstream or at-risk students.
  • (2) Grade levels: Kindergarten, Elementary (Grade 1–6), Middle/High (7–12).
  • (3) Types of intervention: teacher-directed (feedback to teacher, response to intervention), student-directed (peer- or self- assessment), integration of teacher and student assessment.
  • (4) Digital technology: with or without.
  • (5) Program duration: short (less than 1 year), long (≥1 year).
  • (6) Differentiated instruction: with or without, and not applicable for those studies only involved peer- and self- assessment that did not describe teachers’ instruction adjustment.
  • (7) Research design: QED (quasi-experimental design) or RCT (randomized control trial).
  • (8) Sample size: Small ( N ≤ 250 students) or large ( N > 250).
  • (9) Publication type: published or unpublished.
  • (10) Cultural settings: Anglophone culture (Australia, Canada, Ireland, New Zealand, United Kingdom, and United States), CHC (Mainland China, Hong Kong SAR, Taiwan, Singapore, Japan, Korea).

The coding of all characteristics was processed by two researchers independently. Inter-rater reliability was calculated by selecting 20 percent of randomly selected studies. Reliability was 87.21 percent. Disagreements were discussed and rectified in light of the definition proposed. All features of formative assessment are presented in Table 2 and descriptive data of qualified studies can be found in the Supplementary material .

Coding scheme features.

Effect size calculations and statistical analyses

In general, effect sizes were calculated as the difference between experimental and control student posttests after adjusting for pretests and other covariates, divided by the unadjusted posttest pooled standard deviation. When unadjusted pooled standard deviation was not available, as when the only standard deviation presented was already adjusted for covariates or when solely gain score standard deviations were available, procedures proposed by Sedlmeier and Gigerenzer (1989) and Lipsey and Wilson (2001) were used to estimate effect sizes. Provided that pretest and posttest means and standard deviations were presented but adjusted means were not, effect sizes for pretests were subtracted from effect sizes for posttests. An overall average effect size was produced for each study as these outcome measures were not independent. Comprehensive Meta-Analysis software was employed to carry out all statistical analyses, such as Q statistics and overall effect sizes.

Overall effects

A total of 48 qualifying studies was included in the final analysis with a total sample size of 116, 051 K-12 students: 9 kindergarten studies ( N = 2,040), 28 elementary studies ( N = 107,919), 11 middle/high studies ( N = 6,092). The overall effect sizes were calculated in fixed and random effect models. The large Q value ( Q = 313.56, df = 47, p < 0.000) indicated that the distribution of effect sizes in this scope of studies is highly heterogeneous. In other words, the variance of study effect seizes is larger than can be explained by simple sampling error. Thus, a random effects model was adopted ( DerSimonian and Laird, 1986 ; Borenstein et al., 2009 ; Schmidt et al., 2009 ). As shown in Table 3 , the overall weighted effect size is + 0.18 with confident interval between 0.14 and 0.22. In an attempt to interpret this variance, key methodological features (sample size, research design), substantive features (student characteristics, program duration, types of intervention, grade level, digital technology involvement) and extrinsic features (publication type, culture) were used to model some of the variances. An overview of the effect sizes can be seen in Figure 2 that provides a graphical representation of the estimated results of all included studies.

Overall effect size.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-13-990196-g002.jpg

Forest plot of formative assessment effect size on K-12 students’ reading achievement.

Subgroup analysis

The heterogeneity in the overall effect calculation implies that the large differences between these 48 included studies might be related to researchers’ choice of methodology, samples’ substantive features and other factors. Before our meta-regression analysis, we first estimated the comparisons in subgroups as shown in Table 4 . Variation yielded significant differences in effect sizes that were from seven moderators: grade level, type of intervention, program duration, differentiated instruction, sample size, publication type and cultural setting. The effect sizes comparisons of subcategories in rest three moderators, student characteristics, digital technology and research design were non-significant.

Subgroup analysis results.

*p < 0.05, **p < 0.01, ***p < 0.001.

The reason we performed subgroup analysis was to provide basic descriptive data of the ‘constructed’ features in formative assessment. Six substantive features have been investigated in previous reviews ( Kingston and Nash, 2011 ; Klute et al., 2017 ; Lee et al., 2020 ). In this review, we added methodological factors (research design and sample size) and other factors (publication types and cultural settings) to examine how effect size of formative assessment on reading would vary in these categories. Hence, after we added new moderators into the meta-regression model, the subgroup analysis data would help us decipher why some of the results were contrary to previous findings.

Meta-regression

To address the second research question, we regressed all the moderator variables in a model presented in Table 5 to describe the predicted different standard deviation (coefficient) of comparing the categories in each moderator after controlling for other features of formative assessment interventions. It is assumed that the effects from meta-regression are more confidently reliable than results from subgroup analysis by taking account of the iterative influences from different moderators. In our proposed model, only three pairs of moderator categories comparison were significant.

Results of meta-regression.

First, the effect size from different sample sizes varied substantially. As aforementioned, we set N = 250 as a cut-off point. Clearly, small sample size studies yielded a significantly larger effect size ( d = 0.33, p < 0.001) than large sample studies.

Next, three types of intervention were examined, namely, teacher-directed, student-directed and integration of teacher and student assessment. Results indicated that, when other features of formative assessment were controlled, formative assessment only engaged by teachers had a significantly smaller effect size than integrating teacher and students’ assessment in the intervention for reading ( d = –0.12, p < 0.001), whereas, student-directed assessment (self- assessment) showed no significant difference ( d = 0.03, p = 0.769).

Third, in regard to differentiated instruction , we coded formative assessment in reading with or without differentiated instruction. Some interventions that only involved student directed assessment without teachers’ instructional adjustment were coded as not applicable (n.a.). Results, as we hypothesized, favored teachers who used differentiated instruction during or after their formative assessment on students’ reading. If a teacher adopted differentiated or individualized instruction during or after formative assessment, students’ reading achievement would be significantly higher than those of their peers taught by a teacher only applied formative assessment ( d = 0.13, p < 0.001). When formative assessment was directed by the student, the effect size on reading achievement was larger than formative assessment with teachers’ differentiated instruction, albeit not significantly ( d = 0.17, p = 0.124).

Apart from the three pairs of contrast, the rest of the moderator variables comparisons were non-significant, although some showed significant results in subgroup analysis.

Given that research design might influence the effect size, we categorized all studies into randomized controlled studies (RCT) and quasi-experiments (QED). Results from the regression model indicated that effect sizes generated from RCT were smaller than QED design, but not significant ( d = –0.04, p = 0.465).

Digital technology involvement was examined. Surprisingly, students’ reading achievement was not influenced significantly by formative assessment with digital technology ( d = 0.001, p = 0.978).

Formative assessment in reading classrooms seemed to exert a similar impact on students of different grade level s. The effect sizes in elementary level were slightly larger than those for kindergarten studies ( d = 0.02, p = 0.790). Effect sizes for middle/high school studies were also slightly smaller than those for kindergarten studies ( d = –0.03, p = 0.696).

Student characteristics were coded into mainstream students and at-risk students. The estimated effect size of formative assessment on mainstream students was slightly higher than that on at-risk students, albeit not significant ( d = 0.001, p = 0.982).

With respect to program duration , studies that lasted less than 1 year were coded as short programs, others were coded as long ones. Programs in one-year or longer showed smaller effect size than short-term ones, but the difference was non-significant ( d = –0.02, p = 0.576).

Different from the result of the subgroup analysis result regarding cultural setting , the effect size of formative assessment in CHC appeared to be smaller than those in Anglophone culture, though not significant ( d = –0.10, p = 0.297).

Results of publication type revealed that no significant differences were found between published and unpublished articles ( p = 0.701), indicating that no publication bias existed in this review.

The pseudo R 2 value in this meta-regression model estimated the moderators accounted for 95% of heterogeneity. The predictive power of this value is reliable as the number of studies ( k ) in this review exceeded the minimum number of 40 as suggested by López-López et al. (2014) .

Overall effect size

The findings of this review indicate that formative assessment produce a positive effect (ES = + 0.18) on reading achievement. The magnitude could be interpreted as a small effect aligned with the oft-cited indication of small ( d = 0.2), medium ( d = 0.5), and large ( d = 0.8) effect size ( Cohen, 1988 ). However, Kraft (2020) , taking study features, program costs and scalability into account, proposed a new benchmark frame for effect size from causal studies of pre K-12 education intervention, namely small ( d < 0.05), medium (0.05 to <0.20), and large (≥0.20). Accordingly, the overall aggregated effect size in this review could be taken as a medium effect size. Compared with effect sizes (from +0.22 to +0.70) from previous meta-analyses, the weighted average effect size reported in this review was the smallest one. Two potential factors may explain this. First, some early reviews set comparatively looser criteria for inclusion, which often inflates effect size estimates. Pertaining to our set of stricter inclusion criteria, five studies in Klute et al. (2017) review with less than 30 participants in each group ( Fuchs et al., 1989 ; McCurdy and Shapiro, 1992 ; Johnson et al., 1997 ; Iannuccilli, 2003 ; Martens et al., 2007 ) were ruled out in the present review. Second, 35 of our selected studies were conducted after 2010 whereas a latest previous meta-analysis ( Klute et al., 2017 ) only included studies till 2007. As publication bias might be mitigated over time ( Guan and Vandekerckhove, 2016 ), more insignificant or even negative findings were reported. In this review, two large scale studies ( Konstantopoulos et al., 2016 ; Allen, 2019 ) involving over 35,000 student reported negative effects of formative assessment on reading achievement.

The effects of moderators

The meta-regression results indicate that sample size, differentiated instruction and type of intervention suffice to account for the heterogeneity of the effect sizes. Additionally, we intend to discuss some implications from the results of cultural settings, digital technological and publication bias.

Sample size

Prior research indicated that studies with small sample sizes tend to yield much larger effect size than do large ones ( Liao, 1999 ; Cheung and Slavin, 2016 ). In this review, sample size was a crucial variable that might influence the effect size of formative assessment on reading achievement. Two explanations could be put forward for this result. First, intuitively, small-scale studies are more likely able to be implemented with high fidelity. Teachers might find it easier to give more support for students and monitor their progress. Researchers are more likely to purposefully recruit motivated teachers and schools. In this sense, they tend to produce larger effect size than large-scale studies. Next, researchers using small samples would be apt to more design self-developed outcome measures ( Wang, 2008 ; Tsai et al., 2015 ; Lau, 2020 ; Yan et al., 2020 )., which might be more sensitive to treatments than standardized studies ( Cheung and Slavin, 2016 ).

One of the key findings in our review was the positive effects of differentiated instruction during or after formative assessment on reading achievement for K-12 students. This significant result is in accord with the findings from an influential U.S. data-driven reform model on state assessment program. Slavin et al. (2013) found that, for fifth-grade reading, those schools and teachers adjusting reading instruction produced educationally important gains in achievement, while others did not if they merely understood students’ data without further action on instructional adjustment. Formative assessment was analogous to taking a patient’s temperature, while differentiated instruction was analogous to providing a treatment ( Slavin et al., 2013 ).

In a study included in our review with a comparatively promising large effect size ( d = + 0.63) on an early literacy program designed for students at-risk, the researchers concluded that “if one practices formative assessment seriously, one will necessarily end up differentiating instruction” ( Brookhart et al., 2010 , p. 50). In a recent review on formative assessment ( Lee et al., 2020 ), the research team coded a similar moderator “instructional adjustment” and revealed no significant contrast between their four moderator variables: no adjustment, planned adjustment, unplanned adjustment and mixed. We assumed that the effects might be ameliorated if too many variables were coded which led to the insufficient numbers in each category. Additionally, in our own model, primarily we added professional development as a moderator. However, this moderator was highly correlated with “differentiated instruction.” Meta-regression could not be computed due to the collinearity. It is worth mentioned, in our qualified studies, 94% (34/36) of the interventions embedded with differentiated instruction were coupled with professional development for teachers. The evidence in turn implied that professional development is vital in fostering high fidelity of implementing formative assessment on reading programs.

The types of intervention result indicated that an integration of teacher-directed and student-directed would be more effective than formative assessment in reading program directed by teacher or student alone in K-12 settings. In a previous meta-analysis, the research teamed concluded that other-directed formative assessment that encompassed educators or computer software programs was more effective than student-directed formative assessment ( Klute et al., 2017 ). They included nine studies in their review, six of which were designed for students with special education needs. These participants might be less capable of making self- or peer-directed formative assessment. In the present review, a more holistic picture was depicted for general population was obtained advocating an integrated usage of teacher-directed and student-directed assessment.

The results of our review suggested that integrating teacher and student in formative assessment might be more effective than teacher- or student- directed assessment to enhance students’ reading achievement. We attempted to explained this based on linguistic theory ( Kintsch and Van Dijk, 1978 ). Some production-based subject like writing might be more effective when the formative assessment was student-centered ( Black and Wiliam, 1998 ), but reading is a comprehension-based subject that requires explicit instruction necessitated by teachers’ guidance ( McLaughlin, 2012 ). Also, feedback messages require students’ active construction on deciphering with the help of teachers ( Ivanic et al., 2000 ; Higgins et al., 2001 ). But we were given a caveat that it was not a “one size fits for all” suggestion from our screening on studies in Anglophone and Confucian-heritage cultures.

Cultural setting

The subgroup analysis comparison result of interventions in two cultures were significant. Studies conducted in Confucian heritage culture yielded ostensibly much larger sample sizes than those in Anglophone culture. Nevertheless, the non-significant data in meta-regression indicated that it was influenced by other variables. By drawing on the data and the evidence we collected, we found it hard not to associate the impact with sample size. All the qualified studies in CHC were of small sample size. Sample size was reported to be one of the significant moderators which contributed to the variance of effect sizes in this review. After controlling for other moderators, no significant differences were found between the interventions in these two cultures.

Though it was provisional to conclude that there was no difference between the studies in Confucian-heritage culture and Anglophone culture, our screening process and descriptive data in subgroup analysis might render us some hints for the interpretation of the results.

Only eight qualified studies were set in CHC, while 38 for Anglophone culture. The limited number of experimental studies from CHC settings might be associated with the barriers of formative assessment intervention in CHC. Teachers from CHC (Mainland China, Hong Kong SAR, Taiwan, Singapore, Japan, Korea) are often challenged by large class sizes ( Hu, 2002 ) and high-stake test pressure ( Berry, 2011 ), which gives rise to teachers’ psychological burden on assessment ( Chen et al., 2013 ). These sociocultural factors drastically hinder the translation (local adaptation of an educational policy) ( Steiner-Khamsi, 2014 ) of formative assessment. When a school advocates formative assessment for teachers without appropriate professional development, they take it as a “villain of workload” ( Black, 2015 ). Teachers in a test-driven culture would inevitably take formative assessment as a “test” instead of instruction.

Next, particularly of concern is the hint we obtained from the promising results of our included studies in CHC. Researchers in CHC contexts have started to explore alternative ways to implement formative assessment. Six out of eight studies in CHC in our review were self-assessment ( Wang, 2008 ; Butler and Lee, 2010 ; Tsai et al., 2015 ; Chen et al., 2017 ; Lau, 2020 ; Yan et al., 2020 ). This renders us a new direction that self- assessment might be alternatives for reading teachers to implement formative assessment as part of their teaching in CHC classrooms. But we are far from confident to conclude it’s the most effective way based upon the data we reported in this review.

Previous meta-analysis findings revealed that mobile devices ( Sung et al., 2016 ) and educational technology ( Slavin, 2013 ) do not exert significant differences on students’ academic achievement, and digitally delivered formative assessment is only conducive to reading for young school-age children but not for older children ( See et al., 2021 ). In line with those reviews, our findings also indicated that formative assessment with digital technology does not significantly influence students’ reading achievement compared with traditional paper-pen intervention. The findings caution that digital technology is not the kernel of formative assessment. Nevertheless, our findings still advocate technology-enhanced formative assessment as it can provide an evidence-based platform to scaffold students’ learning by generating and deploying formative feedback. From the methodological perspective, computer-based formative assessment systems are generally more accessible for teachers and students than traditional methods ( Tomasik et al., 2018 ). Of note, lessons can be drawn from the undesirable effect sizes of those digital formative assessment programs: (1) A digital formative assessment program can be promisingly effective when teachers in intervention group differentiate their instructional practices based on the evidence feedbacked by the digital program. Researchers from the benchmark or interim assessment with small ( Cordray et al., 2013 ) or even negative effect sizes ( Konstantopoulos et al., 2016 ) reflected that teachers might need further support to adjust their teaching as their classroom schedules were quite crowded. (2) Professional development and training for teachers participating in the digital formative assessment are irreplaceable prerequisites for the quality of practice. Support for teachers to understand the concept and provision of technical assistance are essential for their instructional change ( Connor et al., 2009 ; Kennedy, 2009 ).

Publication bias

To mitigate the threat of publication bias, we included 10 unpublished studies in this review. Traditional methods to assess publication bias included a visual inspection of symmetric dispersion of a funnel plot ( Sterne et al., 2005 ), “fail safe N” statistics ( Orwin, 1983 ), trim-and fill method ( Duval and Tweedie, 2000 ) and setting publication bias as a moderator to test the differences in mean effect sizes between published and unpublished studies ( Polanin and Pigott, 2015 ). As the latter method was comparatively straightforward and more objective than eyeballing evaluation, we took publication bias as a moderator in meta-regression to compare the mean effect sizes of published and unpublished studies by controlling other factors. No significant difference was found between the two groups of studies. We believe that publication bias is not a concern for the current meta-analysis.

This review has revealed that, without publication bias, formative assessment is making a positive and modest difference in enhancing students’ reading achievement in diverse settings. The average weighted effect over all included studies was 0.19. The exact size a researcher finds may deviate considerably depending on the sample size, teachers’ differentiated instruction and type of intervention. Studies involving a large sample size with over 250 students led to low and attenuated estimate of formative assessment. The implementation of teachers’ differentiated instruction is linked to much stronger effects than intervention without differentiated instruction. Also, our results suggested that collaboration of teachers and students in formative assessment would be more effective than formative assessment merely initiated by teachers. Findings suggest that teachers are strongly encouraged to adjust their reading instruction in terms of content, process and product catering to student diversity ( Tomlinson, 2001 ) during the formative assessment in the cooperation with students themselves. Studies with differentiated instruction coupled with teacher’s professional development has a positive and modest effect on reading outcome. To enhance students’ reading achievement and upskill teachers, future studies designs should focus more on effective components that facilitate differentiated instruction and professional development.

This meta-analysis contributes to the existing understanding about formative assessment in K-12 reading program in three significant ways. First, it systematically records the critical components of formative assessment pertaining to reading program for frontline teachers to refer to by catering for learner diversity ( Snow, 1986 ). Second, it affords a new cross-cultural perspective by comparing western and eastern formative assessment practices for school administrator and policy makers to tailor effective programs in their unique cultural contexts. Lastly, it substantiates the discipline-specific characteristics in reading to conceptualize formative assessment for K-12 reading program ( Bennett, 2011 ), which is pivotal to a next-generation definition of domain-dependent formative assessment ( Cizek et al., 2019 ).

It is vital to mention several limitations of this review merely focusing on the quantitative measurement of reading achievement. Evidence-based education advocates the insightful and irreplaceable findings from qualitative research ( Slavin and Cheung, 2019 ). There is much to learn from non-experimental studies that can interpret the effects of formative assessment on students’ reading. Next, this review centered on a standardized test of reading achievement. However, other outcomes maybe of great value to policymakers and practitioners. Third, student-directed assessment is often referred to peer- or self-assessment. Third, the qualified studies in this meta-analysis only include self-assessment. We are aware the value of peer-assessment and strongly suggest future review could locate more qualified studies concerning this type of assessment. Lastly, the culture settings in this study merely include Anglophone or CHC as we could not locate acceptable studies from other cultures temporarily. Studies setting at all cultures were equally important and should be included if possible. Further studies could explore research from other cultures.

Educational borrowing from other countries is not a simple case of duplicating the successful tales, inasmuch as extrapolation and recontextualization of educational interventions are embedded with cultural and historical stories ( Luke et al., 2013 ). Our subgroup analysis indicated cultural settings might be a potential moderator. As a wealth of large-scale formative assessment initiatives have been advanced in classrooms heavily influenced by CHC, synthesized effect sizes in CHC settings are encouraged to be reported to ensure the continuity of formative assessment with cultural script ( Stigler and Hiebert, 1998 , 2009 ). Future reviews can apply narrative synthesis methods to explore the factors that advance or hinder the development of formative assessment on reading in CHC. Considering the complicated implementation of formative assessment on reading ( Lane et al., 2019 ), teachers in CHC classrooms are suggested to explore their own ways to effectively “import” ( Xu and Harfitt, 2018 ) and “translate” high-quality formative assessment ( Black and Wiliam, 1998 ).

Author contributions

QX conceived of the presented idea. QX and AC performed the analytic calculations, performed the numerical simulations and contributed to the final version of the manuscript. DS contributed to important intellectual content in the revised version and the final version of the manuscript.

Acknowledgments

This manuscript would not have been possible without the exceptional support of my supervisor AC. His expertise and encouragement have been an inspiration and kept my work on track. We are grateful for the insightful comments offered by the peer reviewers. We also thank our family, who provide unending support.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2022.990196/full#supplementary-material

  • Abrami P. C., Bernard R. M. (2006). Research on distance education: In defense of field experiments . Distance Educ. 27 , 5–26. [ Google Scholar ]
  • Al Otaiba S., Connor C. M., Folsom J. S., Greulich L., Meadows J., Li Z. (2011). Assessment data–informed guidance to individualize kindergarten reading instruction: Findings from a cluster-randomized control field trial. Elem. Sch. J. 111 535–560. 10.1086/659031 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Allen J. (2019). Does adoption of act aspire periodic assessments support student growth?. Iowa, IA: ACT, Inc. [ Google Scholar ]
  • Andrade H. L., Bennett R. E., Cizek G. J. (2019). “ Formative assessment: History, definition, and progress ,” in Handbook of formative assessment in the disciplines , eds Andrade H. L., Bennett R. E., Cizek G. J. (New York, NY: Routledge; ), 3–19. 10.4324/9781315166933-1 [ CrossRef ] [ Google Scholar ]
  • Bennett R. E. (2011). Formative assessment: A critical review. Assess. Educ. Princ. Policy Pract. 18 5–25. 10.1080/0969594X.2010.513678 [ CrossRef ] [ Google Scholar ]
  • Berry R. (2011). Assessment trends in Hong Kong: Seeking to establish formative assessment in an examination culture. Assess. Educ. Princ. Policy Pract. 18 199–211. 10.1080/0969594X.2010.527701 [ CrossRef ] [ Google Scholar ]
  • Biggs J. (1998). Learning from the confucian heritage: So size doesn’t matter? Int. J. Educ. Res. 29 723–738. 10.1016/S0883-0355(98)00060-3 [ CrossRef ] [ Google Scholar ]
  • Black H., Wiliam D. (1986). “ Assessment for learning ,” in Assessing educational achievement , ed. Nuttall D. L. (London: Falmer Press; ), 7–18. [ Google Scholar ]
  • Black P. (2015). Formative assessment – an optimistic but incomplete vision. Assess. Educ. Princ. Policy Pract. 22 161–177. 10.1080/0969594X.2014.999643 [ CrossRef ] [ Google Scholar ]
  • Black P., Wiliam D. (1998). Assessment and classroom learning. Assess. Educ. Princ. Policy Pract. 5 7–74. 10.1080/0969595980050102 [ CrossRef ] [ Google Scholar ]
  • Black P., Wiliam D. (2009). Developing the theory of formative assessment. Educ. Assess. Eval. Acc. 21 5–31. 10.1007/s11092-008-9068-5 [ CrossRef ] [ Google Scholar ]
  • Black P., Wiliam D. (2018). Classroom assessment and pedagogy. Assess. Educ. Princ. Policy Pract. 25 551–575. 10.1080/0969594X.2018.1441807 [ CrossRef ] [ Google Scholar ]
  • Borenstein M., Cooper H., Hedges L., Valentine J. (2009). “ Effect sizes for continuous data ,” in The handbook of research synthesis and meta-analysis , 2nd Edn, eds Cooper H., Hedges L. V., Valentine J. C. (New York, NY: Russell Sage Foundation; ), 221–235. [ Google Scholar ]
  • Borenstein M., Hedges L., Higgins J., Rothstein H. (2013). Comprehensive meta-analysis version 3. Englewood, CO: Biostat. [ Google Scholar ]
  • Briggs D. C., Ruiz-Primo M. A., Furtak E., Shepard L., Yin Y. (2012). Meta-analytic methodology and inferences about the efficacy of formative assessment. Educ. Meas. 31 13–17. 10.1111/j.1745-3992.2012.00251.x [ CrossRef ] [ Google Scholar ]
  • Brookhart S. M., Moss C. M., Long B. A. (2010). Teacher inquiry into formative assessment practices in remedial reading classrooms. Assess. Educ. Princ. Policy Pract. 17 41–58. 10.1080/09695940903565545 [ CrossRef ] [ Google Scholar ]
  • Burns M. K., Codding R. S., Boice C. H., Lukito G. (2010). Meta-analysis of acquisition and fluency math interventions with instructional and frustration level skills: Evidence for a skill-by-treatment interaction. Sch. Psychol. Rev. 39 69–83. 10.1080/02796015.2010.12087791 [ CrossRef ] [ Google Scholar ]
  • Butler Y., Lee J. (2010). The effects of self-assessment among young learners of English. Lang. Test. 27 5–31. 10.1177/0265532209346370 [ CrossRef ] [ Google Scholar ]
  • Cain M. L. (2015). The impact of the reading 3D program as a component of formative assessment. Doctoral dissertation. Charlotte, NC: Wingate University. [ Google Scholar ]
  • Carless D. (2007). Learning-oriented assessment: Conceptual bases and practical implications. Innov. Educ. Teach. Int. 44 57–66. 10.1080/14703290601081332 [ CrossRef ] [ Google Scholar ]
  • Chappius J. (2009). Seven strategies for assessment for learing. Portland, OR: Pearson Assessment Training Institute. [ Google Scholar ]
  • Chen C., Chen L., Horng W. (2021). A collaborative reading annotation system with formative assessment and feedback mechanisms to promote digital reading performance. Interact. Learn. Environ. 29 848–865. 10.1080/10494820.2019.1636091 [ CrossRef ] [ Google Scholar ]
  • Chen C., Wang J., Lin M. (2017). Enhancement of English learning performance by using an attention-based diagnosing and review mechanism in paper-based learning context with digital pen support. Univers. Access Inf. Soc. 18 141–153. 10.1007/s10209-017-0576-2 [ CrossRef ] [ Google Scholar ]
  • Chen Q., Kettle M., Klenowski V., May L. (2013). Interpretations of formative assessment in the teaching of English at two Chinese universities: A sociocultural perspective. Assess. Eval. High Educ. 38 831–846. 10.1080/02602938.2012.726963 [ CrossRef ] [ Google Scholar ]
  • Cheung A. C. K., Slavin R. E. (2016). How methodological features affect effect sizes in education. Educ. Res. 45 283–292. 10.3102/0013189X16656615 [ CrossRef ] [ Google Scholar ]
  • Cheung A. C. K., Xie C., Zhuang T., Neitzel A. J., Slavin R. E. (2021). Success for all: A quantitative synthesis of US evaluations. J. Res. Educ. Eff. 14 90–115. 10.1080/19345747.2020.1868031 [ CrossRef ] [ Google Scholar ]
  • Cizek G. J., Andrade H. L., Bennett R. E. (2019). “ Formative assessment: History, definition, and progress ,” in Handbook of formative assessment in the disciplines , eds Heidi L. A., Gregory J. C. (New York, NY: Routledge; ), 3–19. [ Google Scholar ]
  • Clark I. (2010). Formative assessment: ‘There is nothing so practical as a good theory’. Aust. J. Educ. 54 341–352. 10.1177/000494411005400308 [ CrossRef ] [ Google Scholar ]
  • Cohen J. (1988). Statistical power analysis for the behavioral sciences. New York, NY: Routledge. [ Google Scholar ]
  • Connor C. M., Jakobsons L., Crowe E., Meadows J. (2009). Instruction, differentiation, and student engagement in reading first classrooms. Elem. Sch. J. 109 221–250. 10.1086/592305 [ CrossRef ] [ Google Scholar ]
  • Connor C. M., Morrison F. J., Fishman B. J., Schatschneider C., Underwood P. (2007). The early years. Algorithm-guided individualized reading instruction. Science 315 464–465. 10.1126/science.1134513 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Connor C. M., Morrison F. J., Fishman B., Crowe E. C., Al Otaiba S., Schatschneider C. (2013). A longitudinal cluster-randomized controlled study on the accumulating effects of individualized literacy instruction on students’ reading from first through third grade. Psychol. Sci. 24 1408–1419. 10.1177/0956797612472204 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Connor C. M., Morrison F. J., Schatschneider C., Toste J., Lundblom E., Crowe E. C., et al. (2011). Effective classroom instruction: Implications of child characteristics by reading instruction interactions on first graders’ word reading achievement. J. Res. Educ. Eff. 4 173–207. 10.1080/19345747.2010.510179 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cordray D. S., Pion G. M., Brandt C., Molefe A. (2013). The impact of the measures of academic progress (MAP) program on student reading achievement. Final report. Washington, DC: NCEE. [ Google Scholar ]
  • Crossouard B., Pryor J. (2012). How theory matters: Formative assessment theory and practices and their different relations to education . Stud. Philos. Educ. 31 , 251–263. [ Google Scholar ]
  • DerSimonian R., Laird N. (1986). Meta-analysis in clinical trials. Control. Clin. Trials 7 177–188. 10.1016/0197-2456(86)90046-2 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dunn K., Mulvenon S. (2009). A critical review of research on formative assessment: The limited scientific evidence of the impact of formative assessment in education. Pract. Assess. Res. Eval. 14 1–11. 10.4324/9780203462041_chapter_1 [ CrossRef ] [ Google Scholar ]
  • Duval S., Tweedie R. (2000). A nonparametric “trim and fill” method of accounting for publication bias in meta-analysis. J. Am. Stat. Assoc. 95 89–98. 10.1080/01621459.2000.10473905 [ CrossRef ] [ Google Scholar ]
  • Earl L. M. (2012). Assessment as learning: Using classroom assessment to maximize student learning. Thousand Oaks, CA: Corwin press. [ Google Scholar ]
  • ESSA (2015). Evidence for ESSA: Standards and procedures. Available online at: https://content.evidenceforessa.org/sites/default/files/On%20clean%20Word%20doc.pdf (accessed June 20, 2022). [ Google Scholar ]
  • Faber J. M., Visscher A. (2018). The effects of a digital formative assessment tool on spelling achievement: Results of a randomized experiment. Comput. Educ. 122 1–8. 10.1016/j.compedu.2018.03.008 [ CrossRef ] [ Google Scholar ]
  • Förster N., Souvignier E. (2014). Learning progress assessment and goal setting: Effects on reading achievement, reading motivation and reading self-concept. Learn. Instr. 32 91–100. 10.1016/j.learninstruc.2014.02.002 [ CrossRef ] [ Google Scholar ]
  • Förster N., Souvignier E. (2015). Effects of providing teachers with information about their students’ reading progress. Sch. Psychol. Rev. 44 60–75. 10.17105/SPR44-1.60-75 [ CrossRef ] [ Google Scholar ]
  • Förster N., Kawohl E., Souvignier E. (2018). Short- and long-term effects of assessment-based differentiated reading instruction in general education on reading fluency and reading comprehension. Learn. Instr. 56 98–109. 10.1016/j.learninstruc.2018.04.009 [ CrossRef ] [ Google Scholar ]
  • Fuchs L. S., Fuchs D. (1986). Effects of systematic formative evaluation: A meta-analysis. Except. Child. 53 199–208. 10.1177/001440298605300301 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fuchs L. S., Fuchs D., Hamlett C. L., Ferguson C. (1992). Effects of expert system consultation within curriculum-based measurement, using a reading maze task . Except. Child. 58 , 436–450. [ Google Scholar ]
  • Fuchs L. S., Butterworth J. R., Fuchs D. (1989). Effects of ongoing curriculum-based measurement on student awareness of goals and progress. Educ. Treat. Child. 12 63–72. [ Google Scholar ]
  • Gersten R., Chard D. J., Jayanthi M., Baker S. K., Morphy P., Flojo J. (2009). Mathematics instruction for students with learning disabilities: A meta-analysis of instructional components. Rev. Educ. Res. 79 1202–1242. 10.3102/0034654309334431 [ CrossRef ] [ Google Scholar ]
  • Gersten R., Haymond K., Newman-Gonchar R., Dimino J., Jayanthi M. (2020). Meta-analysis of the impact of reading interventions for students in the primary grades. J. Res. Educ. Eff. 13 401–427. 10.1080/19345747.2019.1689591 [ CrossRef ] [ Google Scholar ]
  • Glass G. V., McGaw B., Smith M. L. (1981). Meta-analysis in social research. Thousand Oaks, CA: SAGE Publications. [ Google Scholar ]
  • Graham S., Hebert M., Harris K. R. (2015). Formative assessment and writing: A meta-analysis. Elem. Sch. J. 115 523–547. 10.1086/681947 [ CrossRef ] [ Google Scholar ]
  • Guan M., Vandekerckhove J. (2016). A Bayesian approach to mitigation of publication bias. Psychon. Bull. Rev. 23 74–86. 10.3758/s13423-015-0868-6 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hall T. E., Cohen N., Vue G., Ganley P. (2014). Addressing learning disabilities with udl and technology. Learn. Disabil. Q. 38 72–83. 10.1177/0731948714544375 [ CrossRef ] [ Google Scholar ]
  • Hartmeyer R., Stevenson M. P., Bentsen P. (2018). A systematic review of concept mapping-based formative assessment processes in primary and secondary science education. Assess. Educ. Princ. Policy Pract. 25 598–619. 10.1080/0969594X.2017.1377685 [ CrossRef ] [ Google Scholar ]
  • Hattie J. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. New York: NY: Routledge. [ Google Scholar ]
  • Hattie J., Timperley H. (2007). The power of feedback. Rev. Educ. Res. 77 81–112. 10.3102/003465430298487 [ CrossRef ] [ Google Scholar ]
  • Heitink M. C., Van der Kleij F. M., Veldkamp B. P., Schildkamp K., Kippers W. B. (2016). A systematic review of prerequisites for implementing assessment for learning in classroom practice. Educ. Res. Rev. 17 50–62. 10.1016/j.edurev.2015.12.002 [ CrossRef ] [ Google Scholar ]
  • Heritage M. (2010). Formative assessment: Making it happen in the classroom. Thousand Oaks, CA: Corwin. 10.4135/9781452219493 [ CrossRef ] [ Google Scholar ]
  • Higgins R., Hartley P., Skelton A. (2001). Getting the message across: The problem of communicating assessment feedback. Teach. High. Educ. 6 269–274. 10.1080/13562510120045230 [ CrossRef ] [ Google Scholar ]
  • Hu G. (2002). Potential cultural resistance to pedagogical imports: The case of communicative language teaching in China. Lang. Cult. Curric. 15 93–105. 10.1080/07908310208666636 [ CrossRef ] [ Google Scholar ]
  • Iannuccilli J. A. (2003). Monitoring the progress of first-grade students with dynamic indicators of basic early literacy skills. Indiana, PA: Indiana University of Pennsylvania. [ Google Scholar ]
  • Ivanic R., Clark R., Rimmershaw R. (2000). Student writing in higher education: New contexts. Maidenhead: Open University Press. [ Google Scholar ]
  • Johnson L., Graham S., Harris K. R. (1997). The effects of goal setting and self-instruction on learning a reading comprehension strategy: A study of students with learning disabilities. J. Learn. Disabil. 30 80–91. 10.1177/002221949703000107 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jönsson A. (2020). Definitions of formative assessment need to make a distinction between a psychometric understanding of assessment and “evaluative judgment”. Front. Educ. 5 : 2 . 10.3389/feduc.2020.00002 [ CrossRef ] [ Google Scholar ]
  • Kennedy M. M. (2009). Inside teaching. Cambridge, MA: Harvard University Press. [ Google Scholar ]
  • Kingston N., Broaddus A. (2017). The use of learning map systems to support the formative assessment in mathematics. Educ. Sci. 7 : 41 . 10.3390/educsci7010041 [ CrossRef ] [ Google Scholar ]
  • Kingston N., Nash B. (2011). Formative assessment: A meta-analysis and a call for research. Educ. Meas. Issues Pract. 30 28–37. 10.1111/j.1745-3992.2011.00220.x [ CrossRef ] [ Google Scholar ]
  • Kingston N., Nash B. (2012). How many formative assessment angels can dance on the head of a meta-analytic pin: 0.2. Educ. Meas. Issues Pract. 31 18–19. 10.1111/j.1745-3992.2012.00254.x [ CrossRef ] [ Google Scholar ]
  • Kintsch W., Van Dijk T. A. (1978). Toward a model of text comprehension and production. Psychol. Rev. 85 : 363 . 10.1037/0033-295X.85.5.363 [ CrossRef ] [ Google Scholar ]
  • Kluger A. N., DeNisi A. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychol. Bull. 119 254–284. 10.1037/0033-2909.119.2.254 [ CrossRef ] [ Google Scholar ]
  • Klute M., Apthorp H., Harlacher J., Reale M. (2017). Formative assessment and elementary school student academic achievement: A review of the evidence. Washington, DC: Regional Educational Laboratory Central. [ Google Scholar ]
  • Konstantopoulos S., Miller S. R., van der Ploeg A., Li W. (2016). Effects of interim assessments on student achievement: Evidence from a large-scale experiment. J. Res. Educ. Eff. 9 188–208. 10.1080/19345747.2015.1116031 [ CrossRef ] [ Google Scholar ]
  • Kraft M. A. (2020). Interpreting effect sizes of education interventions. Educ. Res. 49 241–253. 10.3102/0013189X20912798 [ CrossRef ] [ Google Scholar ]
  • Lane R., Parrila R., Bower M., Bull R., Cavanagh M., Forbes A., et al. (2019). Literature review: Formative assessement evidence and practice. Melbourne, VI: AITSL. [ Google Scholar ]
  • Lau Kl. (2020). The effectiveness of self-regulated learning instruction on students’ classical Chinese reading comprehension and motivation. Read. Writ. 33 2001–2027. 10.1007/s11145-020-10028-2 [ CrossRef ] [ Google Scholar ]
  • Lee H., Chung H. Q., Zhang Y., Abedi J., Warschauer M. (2020). The effectiveness and features of formative assessment in us k-12 education: A systematic review. Appl. Meas. Educ. 33 124–140. 10.1080/08957347.2020.1732383 [ CrossRef ] [ Google Scholar ]
  • Leighton J., Gierl M. (2007). Cognitive diagnostic assessment for education: Theory and applications. Cambridge: Cambridge University Press. 10.1017/CBO9780511611186 [ CrossRef ] [ Google Scholar ]
  • Liao Y.-K. C. (1999). Hypermedia and students’achievement: A meta-analysis. EdMedia Innov. Learn. 8 , 1398–1399. [ Google Scholar ]
  • Lipsey M. W., Wilson D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: SAGE Publications. [ Google Scholar ]
  • López-López J. A., Marín-Martínez F., Sánchez-Meca J., Van den Noortgate W., Viechtbauer W. (2014). Estimation of the predictive power of the model in mixed-effects meta-regression: A simulation study. Br. J. Math. Stat. Psychol. 67 30–48. 10.1111/bmsp.12002 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Luke A., Woods A., Weir K. (2013). Curriculum, syllabus design, and equity: A primer and model. Milton Park: Routledge. 10.4324/9780203833452 [ CrossRef ] [ Google Scholar ]
  • Marcotte A. M., Hintze J. M. (2009). Incremental and predictive utility of formative assessment methods of reading comprehension. J. Sch. Psychol. 47 315–335. 10.1016/j.jsp.2009.04.003 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Martens B., Eckert T., Begeny J., Lewandowski L., DiGennaro Reed F., Montarello S., et al. (2007). Effects of a fluency-building program on the reading performance of low-achieving second and third grade students. J. Behav. Educ. 16 38–53. 10.1007/s10864-006-9022-x [ CrossRef ] [ Google Scholar ]
  • McCurdy B. L., Shapiro E. S. (1992). A comparison of teacher-, peer-, and self-monitoring with curriculum-based measurement in reading among students with learning disabilities. J. Spec. Educ. 26 162–180. 10.1177/002246699202600203 [ CrossRef ] [ Google Scholar ]
  • McLaughlin M. (2012). Reading comprehension: What every teacher needs to know. Read. Teach. 65 432–440. 10.1002/TRTR.01064 [ CrossRef ] [ Google Scholar ]
  • Miller D. M., Scott C. E., McTigue E. M. (2018). Writing in the secondary-level disciplines: A systematic review of context, cognition, and content. Educ. Psychol. Rev. 30 83–120. 10.1007/s10648-016-9393-z [ CrossRef ] [ Google Scholar ]
  • Moher D., Liberati A., Tetzlaff J., Altman D. G., Group P. (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. PLoS Med. 6 : e1000097 . 10.1371/journal.pmed.1000097 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mok M. M. C. (2012). “ Assessment reform in the Asia-Pacific region: The theory and practice of self-directed learning oriented assessment ,” in Self-directed learning oriented assessments in the Asia-Pacific. Education in the Asia-Pacific region: Issues, concerns and prospects , ed. Mok M. (Dordrecht: Springer; ), 3–22. 10.1007/978-94-007-4507-0_1 [ CrossRef ] [ Google Scholar ]
  • Moss C. M., Brookhart S. M. (2009). Advancing formative assessment in every classroom: A guide for instructional leaders. Alexandria, VA: ASCD. [ Google Scholar ]
  • OECD (2008). Assessment for learning formative assessment. Paris: OECD. [ Google Scholar ]
  • Orwin R. G. (1983). A fail-safe N for effect size in meta-analysis. J. Educ. Stat. 8 157–159. 10.2307/1164923 [ CrossRef ] [ Google Scholar ]
  • Palmer E., Devitt P. (2014). The assessment of a structured online formative assessment program: A randomised controlled trial. BMC Med. Edu. 14 : 8 . 10.1186/1472-6920-14-8 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Peters M. T., Hebbecker K., Souvignier E. (2021). Effects of providing teachers with tools for implementing assessment-based differentiated reading instruction in second grade. Assess. Eff. Interv. 47 , 157–169. 10.1177/15345084211014926 [ CrossRef ] [ Google Scholar ]
  • Pigott T. D., Polanin J. R. (2019). Methodological guidance paper: High-quality meta-analysis in a systematic review . Rev. Educ. Res. 90 , 24–46. 10.3102/0034654319877153 [ CrossRef ] [ Google Scholar ]
  • Polanin J. R., Pigott T. D. (2015). The use of meta-analytic statistical significance testing. Res. Synth. Methods 6 63–73. 10.1002/jrsm.1124 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Popham W. J. (2008). Formative assessment: Seven stepping-stones to success. Prin. Leadersh. 9 16–20. [ Google Scholar ]
  • Roskos K., Neuman S. B. (2012). Formative assessment: Simply, no additives . Read. Teach . 65 , 534–538. [ Google Scholar ]
  • Sanchez C. E., Atkinson K. M., Koenka A. C., Moshontz H., Cooper H. (2017). Self-grading and peer-grading for formative and summative assessments in 3rd through 12th grade classrooms: A meta-analysis. J. Educ. Psychol. 109 1049–1066. 10.1037/edu0000190 [ CrossRef ] [ Google Scholar ]
  • Sandelowski M., Voils C. I., Leeman J., Crandell J. L. (2012). Mapping the mixed methods-mixed research synthesis terrain. J. Mix. Methods Res. 6 317–331. 10.1177/1558689811427913 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schmidt F. L., Oh I. S., Hayes T. L. (2009). Fixed-versus random-effects models in meta-analysis: Model properties and an empirical comparison of differences in results. Br. J. Math. Stat. Psychol. 62 97–128. 10.1348/000711007X255327 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sedlmeier P., Gigerenzer G. (1989). Do studies of statistical power have an effect on the power of studies? Psychol. Bull. 105 309–316. 10.1037/0033-2909.105.2.309 [ CrossRef ] [ Google Scholar ]
  • See B. H., Gorard S., Lu B., Dong L., Siddiqui N. (2021). Is technology always helpful?: A critical review of the impact on learning outcomes of education technology in supporting formative assessment in schools. Res. Pap. Educ. 1–33. 10.1080/02671522.2021.1907778 [ CrossRef ] [ Google Scholar ]
  • Shadish W. R., Cook T. D., Campbell D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton, Mifflin and Company. [ Google Scholar ]
  • Shimojima Y., Arimoto M. (2017). Assessment for learning practices in Japan: Three steps forward, two steps back. Assess. Matters 11 : 2017 . 10.18296/am.0023 [ CrossRef ] [ Google Scholar ]
  • Simmons D. C., Kim M., Kwok O. M., Coyne M. D., Simmons L. E., Oslund E., et al. (2015). Examining the effects of linking student performance and progression in a Tier 2 kindergarten reading intervention. J. Learn. Disabil. 48 255–270. 10.1177/0022219413497097 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Slavin R. E. (2013). Effective programmes in reading and mathematics: Lessons from the best evidence encyclopaedia. Sch. Eff. Sch. Improv. 24 383–391. 10.1080/09243453.2013.797913 [ CrossRef ] [ Google Scholar ]
  • Slavin R. E., Cheung A. C. K. (2019). Evidence-based reform in education: Responses to critics. Sci. Insigt. Edu. Front. 2 65–69. 10.15354/sief.19.ar027 [ CrossRef ] [ Google Scholar ]
  • Slavin R., Smith D. (2009). The relationship between sample sizes and effect sizes in systematic reviews in education . Educ. Eval. Policy Anal. 31 , 500–506. [ Google Scholar ]
  • Slavin R. E., Cheung A. C. K., Holmes G., Madden N. A., Chamberlain A. (2013). Effects of a data-driven district reform model on state assessment outcomes. Am. Educ. Res. J. 50 371–396. 10.3102/0002831212466909 [ CrossRef ] [ Google Scholar ]
  • Snow R. E. (1986). Individual differences and the design of educational programs. Am. Psychol. 41 : 1029 . 10.1037/0003-066X.41.10.1029 [ CrossRef ] [ Google Scholar ]
  • Spector J. M., Ifenthaler D., Samspon D., Yang L., Mukama E., Warusavitarana A., et al. (2016). Technology enhanced formative assessment for 21st century learning . Educ. Technol. Soc. 19 , 58–71. [ Google Scholar ]
  • Steiner-Khamsi G. (2014). Cross-national policy borrowing: Understanding reception and translation. Asia Pac. Educ. Rev. 34 153–167. 10.1080/02188791.2013.875649 [ CrossRef ] [ Google Scholar ]
  • Sterne J. A., Becker B. J., Egger M. (2005). “ The funnel plot ,” in Publication bias in meta-analysis: Prevention, assessment and adjustments , eds Rothstein H. R., Sutton A. J., Borenstein M. (Chichester: Wiley; ), 75–98. 10.1002/0470870168.ch5 [ CrossRef ] [ Google Scholar ]
  • Stigler J. W., Hiebert J. (1998). Teaching is a cultural activity. Teach. Educ. 22 4–11. [ Google Scholar ]
  • Stigler J. W., Hiebert J. (2009). The teaching gap: Best ideas from the world’s teachers for improving education in the classroom. New York, NY: Simon and Schuster. [ Google Scholar ]
  • Sung Y.-T., Chang K.-E., Liu T.-C. (2016). The effects of integrating mobile devices with teaching and learning on students’ learning performance: A meta-analysis and research synthesis. Comput. Educ. 94 252–275. 10.1016/j.compedu.2015.11.008 [ CrossRef ] [ Google Scholar ]
  • Tomasik M. J., Berger S., Moser U. (2018). On the development of a computer-based tool for formative student assessment: Epistemological, methodological, and practical issues. Front. Psychol. 9 : 2245 . 10.3389/fpsyg.2018.02245 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tomlinson C. A. (2001). How to differentiate instruction in mixed-ability classrooms. Alexandria, VA: ASCD. [ Google Scholar ]
  • Topping K. J., Fisher A. M. (2003). Computerised formative assessment of reading comprehension: Field trials in the UK. J. Res. Read. 26 267–279. 10.1111/1467-9817.00202 [ CrossRef ] [ Google Scholar ]
  • Tsai F.-H., Tsai C.-C., Lin K.-Y. (2015). The evaluation of different gaming modes and feedback types on game-based formative assessment in an online learning environment. Comput. Educ. 81 259–269. 10.1016/j.compedu.2014.10.013 [ CrossRef ] [ Google Scholar ]
  • Van der Kleij F. M., Cumming J. J., Looney A. (2017). Policy expectations and support for teacher formative assessment in Australian education reform. Assess. Educ. Princ. Policy Pract. 25 620–637. 10.1080/0969594X.2017.1374924 [ CrossRef ] [ Google Scholar ]
  • Wang A., Firmender J. M., Power J. R., Byrnes J. P. (2016). Understanding the program effectiveness of early mathematics interventions for prekindergarten and kindergarten environments: A meta-analytic review. Early Educ. Dev. 27 692–713. 10.1080/10409289.2016.1116343 [ CrossRef ] [ Google Scholar ]
  • Wang T. (2008). Web-based quiz-game-like formative assessment: Development and evaluation. Comput. Educ. 51 1247–1263. 10.1016/j.compedu.2007.11.011 [ CrossRef ] [ Google Scholar ]
  • What Works Clearinghouse (2020). What works clearinghouse standards handbook, version 4.1 ed. Washington, DC: Institute of Education Sciences: National Center for Education Evaluation and Regional Assistance. [ Google Scholar ]
  • Wiliam D. (2011). What is assessment for learning? Stud. Educ. Evaluation 37 3–14. 10.1016/j.stueduc.2011.03.001 [ CrossRef ] [ Google Scholar ]
  • Wiliam D., Thompson M. (2007). “ Integrating assessment with learning. What will it take to make it work? ,” in The future of assessment , ed. Dwyer C. A. (New York, NY: Routledge; ), 53–82. 10.4324/9781315086545-3 [ CrossRef ] [ Google Scholar ]
  • Wisniewski B., Zierer K., Hattie J. (2019). The power of feedback revisited: A meta-analysis of educational feedback research. Front. Psychol. 10 : 3087 . 10.3389/fpsyg.2019.03087 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Xu Y., Harfitt G. (2018). Is assessment for learning feasible in large classes? Challenges and coping strategies from three case studies. Asia Pacific J. Educ. 47 472–486. 10.1080/1359866X.2018.1555790 [ CrossRef ] [ Google Scholar ]
  • Yan Z., Chiu M. M., Ko P. Y. (2020). Effects of self-assessment diaries on academic achievement, self-regulation, and motivation. Assess. Educ. Princ. Policy Pract. 27 562–583. 10.1080/0969594X.2020.1827221 [ CrossRef ] [ Google Scholar ]
  • Zhang S., Thompson N. (2004). DIALANG: A diagnostic language assessment system. Can. Mod. Lang. Rev. 61 290–293. 10.1353/cml.2005.0011 [ CrossRef ] [ Google Scholar ]

COMMENTS

  1. Developing Formative Assessment in the Classroom: Using Action Research

    formative assessment in ordinary classroom settings. We also wanted to explore whether, and to what extent, an 'action research' approach could help to accomplish the process of development. A central problem of any application of research to teaching is the potential conflict between theoretical and practical wisdom.

  2. Formative assessment: A systematic review of critical teacher

    1. Introduction. Using assessment for a formative purpose is intended to guide students' learning processes and improve students' learning outcomes (Van der Kleij, Vermeulen, Schildkamp, & Eggen, 2015; Bennett, 2011; Black & Wiliam, 1998).Based on its promising potential for enhancing student learning (Black & Wiliam, 1998), formative assessment has become a "policy pillar of educational ...

  3. Building up formative assessment practices through action research: A

    The findings suggested that action research could be an important underpinning professional learning approach to support formative assessment practices in initial early childhood teacher education. More widely, action research is critically recommended in teacher learning and practice that involves systematic reflection, data-driven decision ...

  4. Classroom action research on formative assessment in a context-based

    Research on formative assessments suggests a positive impact on students' science achievement, although its success depends on how the formative assessment is implemented in class. ... In a classroom action research setting, a pre-test/post-test control group design with switching replications was applied. Student achievement was measured in ...

  5. Formative Assessment and Feedback Strategies

    In their synthesis of feedback research in view of self-regulated learning, Butler and Winne adopted a cyclical conceptualization of formative assessment and feedback from a learner's perspective.In the context of peer assessment and peer feedback, this cyclical conceptualization of the formative assessment process has been, for example, picked up by Reinholz ().

  6. Developing Formative Assessment in the Classroom: Using action research

    Phase 1 involved action research in an international school in Hong Kong to develop formative assessment strategies in my own and two colleagues' classrooms. The phase 2 action research cycle investigated how formative assessment implementation could be enhanced and developed through collaboration within a professional learning team (PLT).

  7. The multifaceted function of rubrics as formative assessment tools: A

    This classroom-based action research (CBAR) corroborated our belief in the valuable role rubrics play in a tertiary L2 writing context where English is the medium of instruction. ... The multifaceted function of rubrics as formative assessment tools: A classroom-based action research in an L2 writing context. Megan Khairallah https://orcid.org ...

  8. Developing Formative Assessment in the Classroom: Using action research

    Abstract This article reports the outcomes of a research project designed to investigate and develop formative classroom assessment in primary schools. The project was a collaborative one, involving two university-based researchers and a team of teacher-researchers. The aims were to build on basic research already carried out by the university researchers by investigating the issues from a ...

  9. PDF Changing Teaching Through Formative Assessment: Research and Practice

    Introduction. This paper is the story of a development which started with a review of what research had to say about formative assessment. The work of this review is first described. Its results led to development work with teachers to explore how ideas taken from the research could be turned into practice.

  10. Aligning formative and summative assessments: A collaborative action

    To answer this research question, this study adopts a collaborative action research approach (McNiff, 2002, Meijer et al., 2010, Torrance and Pryor, 2001) with practitioners, academics and consultants, and bottom-up alternated with top-down activities to first examine teachers' current formative assessment practices for the purpose of ...

  11. PDF Using action research to improve learning and formative assessment to

    We find an analogous sequence of steps both in the action research-approach and in the formative assessment approach. Below we will show that the two can be combined together when we develop curriculum materials. A. Action research Action research is an inquiry tradition that researchers of different social sciences follow in order to ...

  12. Using action research to improve learning and formative assessment to

    In particular we focus on how the action research approach paradigm combined with instructional approaches such as scaffolding and formative assessment can be used to design the learning environment, investigate student learning, revise curriculum materials, and conduct subsequent assessment. As the result of the above efforts we found ...

  13. Designing Formative Assessment That Improves Teaching and ...

    Formative assessment can be seen as an ongoing process of monitoring students' learning to decide which teaching and learning actions should be taken to better suit students' needs (Allal, 2020; Black & Wiliam, 2009).Activities that are part of effective formative assessment include clarifying expectations, eliciting and analyzing evidence of student learning, communicating the outcomes ...

  14. ERIC

    Research on formative assessments suggests a positive impact on students' science achievement, although its success depends on how the formative assessment is implemented in class. ... In a classroom action research setting, a pre-test/post-test control group design with switching replications was applied. Student achievement was measured in ...

  15. (PDF) Classroom action research on formative assessment in a context

    Abstract and Figures. Context-based science courses stimulate students to reconstruct the information presented by connecting to their prior knowledge and experiences. However, students need ...

  16. (PDF) Formative assessment: A critical review

    Officers (CCSSO) and contained in McManus (2008, 3): 'Formative assessment is a. process used by teachers and students during instruction that provides feedback to. adjust ongoing teaching and ...

  17. Developing Formative Assessment in the Classroom: Using action research

    This article reports the outcomes of a research project designed to investigate and develop formative classroom assessment in primary schools. The project was a collaborative one, involving two university-based researchers and a team of teacher-researchers. The aims were to build on basic research already carried out by the university researchers by investigating the issues from a more ...

  18. The effectiveness of formative assessment for enhancing reading

    Introduction. In an era of reconfiguring the relationship between learning and assessment, spurred by quantitative and qualitative evidence, formative assessment is proffered to meet the goals of lifelong learning and promote high-performance and high equity for all students ().It has gained momentum among researchers and practitioners in various culture contexts.

  19. Formative Assessment: Balancing Educational Effectiveness and Resource

    Methods. The following research methods were used: A literature review highlights key issues and principles. Using the School of the Built Environment, Heriot-Watt University as a case study, template forms to prompt reflection were completed voluntarily by academics to describe and evaluate effectiveness and efficiency of different types of formative assessment already in use, from the point ...

  20. Improving Formative Assessment Practice to Empower ...

    The authors describe an effective four-step process for improving teachers' formative assessment practices that provides opportunities to reflect, consider alternative instructional approaches ...

  21. PDF The Effects of Formative Assessment on Academic Achievement ...

    According to the research results, formative assessment was the third most influential factor among 138 factors for students' achievement. In the same order, feedback, which is one of the most significant elements of formative assessment, came in at eighth place. However, only two meta-analyses (Burns & Symington, 2002; Fuchs & Fuchs,

  22. PDF The Impact of Formative Assessment and Learning ...

    efficacy of formative assessment strategies, particularly as they relate to academic outcomes. 5 Dunn and Mulvenon, Op. cit., p. 2. 6 Clark, I. "Formative Assessment: Policy, Perspectives and Practice." Florida Journal of Educational & Administration

  23. PDF The Re s e arch Bas e for Formative As s e s sment

    The original research base on formative assessment is most typically traced back to the 1998 publication Assessment and Classroom Learning (Black & Wiliam, 1998), the first widely cited review of literature on formative assessment in the English language. The researchers found "firm evidence" that formative assessment can work, but also ...

  24. Formative assessment: missing in action in both research-intensive and

    Formative assessment was the weakest domain in all three university assessment environments, followed closely by students' internalising standards. Students at the new teaching-focused university had significantly higher scores on scales about deep learning, student effort and the quality of feedback than students in the two research-intensives.