With the very best of intentions, assessment has gone rogue.
It’s hard to imagine now, but when I started teaching in the late 80s, we didn’t really do assessment. We didn’t do much by way of marking, there were no SATS, no data drops and GCSE results didn’t get turned into league tables. A few years later I remember being incredibly excited by the idea that school improvement should be based on actual data about what was and wasn’t going well, as opposed to an unswerving belief in triple mounting as the benchmark of best practice. That seemed such a modern, progressive idea that would really help schools focus on the right kind of things. Around about the same time came ideas about the power of assessing for learning. Now we would actually know, rather than just assert, what really was effective practice. Henceforth we would teach children what they really needed to learn. A bright new future beckoned. I was an enthusiast.
The ‘father’ of sociology Max Weber talks about routinisation. All charismatic movements have to change in order to ensure their long-term survival. But in changing they must give up their definitive, charismatic qualities. Instead of exciting possibilities we get routines, policies and KPIs as the charisma is, of necessity, institutionalised.
Exciting ideas are all very well, but of course they need, to use a ghastly word ‘operationalising.’ But what happens over time is that the originally revolutionary impulse becomes so well established in systems and routines that they become more important than the original idea. Powerful ideas arise to address specific problems. Once routinised, the specific problem can get forgotten. Instead, we get unthinking adherence to a set of practises, divorced from reflection on whether or not those practises actually serve the purposes they were set up to address.
One of the things that has gone wrong with assessment is that it has morphed into a single magical big ‘thing’ that schools must do rather than a repertoire of different practices thoughtfully employed in different circumstances. The performance of assessment rituals is perceived as creating the reality of educational ‘righteousness’ – by doing certain things, like data drops and targets and marking and so on and so forth, a school becomes good, or at the very least avoids being bad. Not having some big system comes unthinkable.
But assessment is not one thing. It is not a ritual to be performed. Assessment is a tool, or rather set of tools, not an end in itself. Assessment is the process of doing something in order to find something out and then doing something as a result of having that new information. Because there are lots of different things we might want to find out, and lots of different ways we might seek to find that information, assessment cannot be one thing. The term assessment covers a range of different tools, all with different purposes. Whenever we are tempted to assess something, we should ask ourselves what is it we are trying to find out and what will be done differently as a result of having this information? If we can’t answer those two questions, we are on a hiding to nothing.
So what are these different purposes? The familiar language of formative and summative assessment – or more correctly – formative and summative inferences drawn from assessments is a helpful starting point. Formative assessment helps guide further learning; summative assessment evaluates learning at the end of a period of study by comparing it against a standard or benchmark.
But if we are to remind ourselves about the actual reasons why we might want to assess something, I think we need to expand beyond these two categories. I’ve come up with six, three that are different kinds of formative assessment and three that our summative. By being clear about the purpose of each different type and not mixing them up we can get assessment back to being a powerful set of tools, that can be used thoughtfully where and when appropriate.
Formative assessment includes:
Diagnostic assessment which provides teachers with information which enables them to diagnose individual learning needs and plan how to help pupils make further progress. Diagnostic assessment is mainly for teachers rather than pupils. If a pupil does not know enough about a topic, then they do not need feedback, they need more teaching. Feedback is for the teacher so they can adapt their plans. Trying to teach a child who does not know how to do something by giving the kind of feedback that involves writing a mini essay on their work is not only incredibly time consuming for the teacher, it is also highly unlikely to be effective. Further live teaching that addresses problem areas in subsequent lessons is going to do much more to address a learning issue than performative marking rituals.
Diagnostic assessment involves checking for understanding:
- In the moment, during lessons, so that teachers can flex their teaching on the spot to clarify and address misconceptions.
- After lessons, through looking at pupils’ work, in order to plan subsequent lessons to meet pupil needs.
- At the end of units of work, in order to evaluate how successful the teaching of a particular topic has been and what might need to be improved the next time this unit is taught. An end of unit assessment of some sort is one possible way of doing this. Another might be looking through children’s books or using a pupil book study approach.
- In the longer term, in order to check what pupils have retained over time, so that we can provide opportunities for revisiting and consolidating learning that has been forgotten.
Diagnostic assessment should not be conflated with motivational assessment or pupil self-assessment. A lot of the problems with assessment have arisen because the various kinds of formative assessment have been lumped together into one thing alongside a huge emphasis on evidencing that they have taken place. This has led to an obsession with teachers physically leaving an evidence trail by putting their ‘mark’ on pupils’ work – in rather the same way that cats mark out their territory through leaving their scent on various trees.
Diagnostic assessment is assessment for teaching. The next two forms of formative assessment are assessment for learning. Assessment for teaching is probably the most powerful of all forms of assessment and yet has been overlooked in favour of afl approaches selected mainly for their visibility.
Motivational assessment provides pupils (or their parents/carers) with information about what they have done well and what they can do to improve future learning. For motivational assessment to be effective in improving future learning, it must tell the pupil something that is within their power to do something about. Telling a child to ‘include more detail’ when they do not know more detail is demotivating and counterproductive. To use the familiar example from Dylan Wiliam, there is no point in telling a child to ‘be more systematic in their scientific enquires’ because if they knew how to be systematic, they would have done it in the first place.
Only where the gap between actual and desired performance is small enough for the pupil to address it with no more than a small nudge, can feedback be motivating. On the other hand, feedback about effort, attendance, behaviour or homework could provide information that may have the potential to motivate pupils to make different choices.
Pupil self-assessment: Pupil agency, resilience and independence can be built by teaching subject-specific metacognitive self-assessment strategies. Teaching pupils about the power of retrieval practice and how they can use this to enhance their learning is a very powerful strategy and should form a central plank of each pupil’s self-assessment repertoire. Retrieval practice is not one thing. There are a range of ways of doing it. Younger pupils benefit from a degree of guided recall, whereas as children get older, more emphasis on free recall is more likely to be effective.
Pupils should also be taught strategies for checking their own work – for example monitoring writing for transcription errors, reading written work aloud to check for sense and clarity, using inverse operations in maths to check for answers, monitoring one’s comprehension when reading and then rereading sections when one notices that what you’ve read does not make sense. Pupils need be given time to use these tools routinely to check and improve their work.
Summative assessment includes:
Assessment for certification. This includes exams and qualifications. Some of these – a grade 5 music exam for example, state that a certain level of performance has been achieved. Others, such as A levels and to an extent GCSEs, are rationing mechanisms to determine access to finite resources in a relatively fair way. Unfortunately, some of these assessments have been used evaluatively. This is not what these qualifications are designed for and all sorts of unhelpful and unintended consequences fall out of using qualifications as indicators of school quality. In particular, it distorts the profession’s understanding of what assessment looks like and leads to the proliferation of GCSE-like wannbe assessments used throughout secondary schools.
Evaluative assessment enables schools to set targets and benchmark their performance against a wider cohort. Evaluative assessment can also feed into system-wide data allowing MATs, Local Authorities and the DfE to monitor and evaluate the performance of the schools’ system at an individual school and whole system level.
It is perfectly reasonable for large systems to seek to gather information about performance, as long as this is done in statistically literate ways. This generally means using standardised assessments and being aware of their inherent limitations. Just because we want to be able to ‘measure’ something, doesn’t mean it is actually possible. (Indeed, I have a lifelong commitment to eradicate the word ‘measure’ from the assessment lexicon.) Standardised assessments have a degree of error (as do all assessments – though standardised assessments at least have the advantage of knowing the likely range of this error). As a result, the inferences we are able to make from them are more reliable when talking about attainment than progress because progress scores involve the double whammy of two unreliable numbers. They are also far more reliable at a cohort level than for making inferences about individuals since over and under performance by individuals will balance each other out when considering the performance of a cohort as a whole.
Since standardised assessments do not exist for many subjects, it is not possible to evaluate performance for say geography in the same way it is possible as it is for maths. Non standardised assessments that a school devises might give the school useful information – for example they could tell the school how successfully their curriculum has been learnt, but they don’t allow for reliable inferences about performance in geography beyond that school.
Given these limitations – the unreliability at individual pupil level, the unreliability inherent in evaluating progress and the unavailability of standardised assessments in most subjects, schools should think very carefully about any system for tracking pupil attainment or progress. By all means have electronic data warehouses of attainment information but be very aware of what the information within can and can’t tell you. I’d recommend reading Dataproof Your School to make sure you are fully aware of the perils and pitfalls involved in seeking to make inferences from data.
What is more, summative assessment in reading is notoriously challenging since reading comprehension tests suffer from construct-irrelevant variance. In other words, they assess things other than reading comprehension such as vocabulary and background knowledge. More reliable inferences could be made were there standardised assessments of reading fluency. However, the one contender to date that could do this – the DIBELS assessment – explicitly rules out its use to evaluate performance of institutions.
Evaluative assessment is just one type of assessment with a limited, narrow purpose. It should not become the predominant form of assessment.
Informative assessment enables schools to report information about performance relative to other pupils to parents/carers, as well as information to help older pupils make choices about the examination courses, qualifications and careers. This is the most challenging aspect to get right when seeking to develop an assessment system that avoids the problems of previous practice. Often, schools use the same system that is used for evaluative assessment for accountability purposes. But evaluative assessment is most reliable when talking about large groups of pupils, not individuals, so where schools share standardised scores, they need to caveat this with an explanation about the limits of accuracy.
Let’s ask ourselves, what it is that parents what to find out about their child?
Most parents what to know
- Is my child happy?
- Is my child trying hard?
- How good are they compared to what you would expect for a child of this age?
- What can I do to help them?
However, parents do not necessarily want to have the answer to all of these questions in all subjects all of the time.
The first question is obviously important and schools will have a variety of ways of finding this out. It is probably most pressing when a child starts at a school. For example, it would be an odd secondary school that didn’t seek to find out if their new year 7s had settled in well at some point during the autumn term.
The second question involves motivational assessment. Schools sometimes have systems of effort grades. These can work well where the school has worked hard with staff to agree narrative descriptors of what good effort actually involves and what it means to improve effort. For example, as well as attendance and punctuality, this could include the extent to which pupils
- Monitor their own learning for understanding and ask for help when unsure or stuck
- Contribute to paired or group tasks
- Show curiosity
- The attitude to homework
- Work independently
Thus they create a metalanguage that allows a shared understanding of what it means for a child to work effortfully. This can then be shared with pupils and parents. This metalanguage is portable between subjects. To a large degree, to work effortfully in Spanish involves the same behaviours as working effortfully in art. The metalanguage provides a short cut to describe what those behaviours are and where necessary how they could be further built upon. If there is a disparity between subjects, it allows for meaningful conversation about what is it specifically that the child isn’t doing in a particular subject that they could address.
If this work developing a shared understanding work does not take place and individual teachers are just asked to rate a child on a 4-point scale, then inevitably some teachers will grade children more harshly than others. I am sure I am not the only parent who has interrogated their child as to why their effort is only 3 in geography, yet it is 4 in everything else? When maybe the geography teacher reserves 4 for truly exceptional behaviour whereas the others score 4 for generally fine?
But it’s the third question that is really challenging. Schools sometimes avoid this altogether and talk about effort and what wonderful progress a child had made which is all well and good but can go horribly wrong if no one has ever had an honest conversation with parents about how their child’s performance compares with what is typical. It shouldn’t come as a surprise to parents if their child gets 2s and 3s at GCSEs for example. This might represent significant achievement and brilliant progress but parents should be aware that relatively speaking their child is finding learning in this subject more challenging than many of their peers.
However many schools often go to the other extreme and give parents all sorts of numerical information that purports to report with impressive accuracy how their child is doing. The problem being this accuracy is not only entirely spurious but rests on teachers spending valuable curriculum time on assessment activities and then even more valuable leisure time marking these assessments. And why? Just so that parents can be served up some sort of grade or level at regular intervals.
Grades or levels are important for qualifications because they represent a shared metalanguage, a shared currency that opens – or closes – doors to further study or jobs. Pandemics aside, considerable statistical modelling goes into to making sure grades have at least some sort of consistency between years. Schools however do not need to try to generate assessments that can then be translated into some kind of metalanguage that is translatable across subjects. The earlier example of effort worked because effort is portable and comparable. It is possible to describe the effort a child habitually makes in Spanish and in DT and be talking about the same observable behaviours. This is not the same for attainment. There isn’t some generic, context-free thing called standards of attainment that can be applied from subject to subject. We can measure length in a variety of different contexts because we have an absolute measure of a metre against which all other meters can be compared. There isn’t an absolute standard grade 4 in a vault at Ofqual. Indeed, some subjects, such as maths, assess in terms of difficulty whereas others, such as English, assess in terms of quality. Even within the same subject it is not straightforward to compare standards in one topic with another. Attainment in athletics might not bear any relating to attainment in swimming or dance for example, let alone meaning the same sort of standard of attainment in physics. So even if it were desirable for schools to communicate attainment to parents via a metalanguage, it wouldn’t actually communicate anything of any worth.
Yet in many schools the feeling persists that unless there is a conditionally formatted spreadsheet somewhere, learning cannot be said to have taken place. Learning is not real until it has been codifed and logged. But schools are not grade farms that exist to grow crops of assessment data. What we teach children is inherently meaningful and does not acquire worth or value through being assessed and labelled, let alone assessed and labelled in a self-deceiving, spurious way.
But if we do not have a metalanguage of some sort, how can we communicate to parents how well their child is doing?
First of all, the idea that telling parents that their child is working at ‘developing plus’, at a grade 3 or whatever other language we use is helpful because it uses a shared language is fanciful. The vast majority of parents will have not idea whether a grade 3 or developing plus or whatever is any good. Even if they do, we are very likely misleading parents by purporting to share information with an accuracy that it just can’t have. If we tell parents that their child is grade 3c in RE but grade 3b in science, does that actually mean their RE is weaker than their science? If in the next science assessment the child gets a 3c, have they actually regressed? Do they really know less they than they did previously? And in any case, is a 3b good, bad or indifferent?
Nor is the use of metalanguage particularly useful for teachers. What helps teachers teach better is knowing the granular detail of what a child can and can’t do. Translating performance into a metalanguage by averaging everything out removes exactly the detail that makes assessment useful. Teachers waste time translating granular assessment information into their school’s metalanguage then meeting with leaders who want to know why such and such a child is flagging as behind. They then having to translate back from the metalanguage into the granular to explain what the problem areas are. All this just because conditionally formatted spreadsheets give an illusion of rigour and dispassionate analysis.
While most parents will probably want to know how well their child is doing relative to what might be typical for a child of their age, this does not mean parents want this information for every subject every term. Secondary schools in particular seem to have been sucked into a loop of telling parents every term about attainment in every subject. Not only is this not necessary, it also actively undermines standards in subjects with lesser teaching time. Take music for example. A child might get 1 lesson a week in music and 4 lessons a week in maths. If both music and maths have to summatively assess children at the same frequency, then a disproportionate amount of time that could be used for teaching music will be used instead to assess it.
Instead, school could have a reporting rota system. For example, in a secondary school context it might look something like this:
October year 7: information about how the child is settling.
Effort descriptors for 4 subjects
December year 7: attainment information for English, maths and history
Effort descriptors for 4 other subjects
March year 7: attainment information for science, geography and languages
Effort descriptors for 4 other subjects
Art and DT exhibition.
July year 7: attainment information for RE and computing, plus English and maths standardised scores
Effort descriptors for all subjects
with a similar pattern in year 8 and year 9, though with information for all subjects coming earlier in the year for year 9 to inform children making their options.
This reduces workload and allows teaching time to focus on teaching rather than generating assessments to feed a hungry data system. It does not mean that teaches can’t round off a topic with a final task that brings together various strands that have been taught over a series of lessons if this would enhance learning. It makes this a professional decision. It may be that writing an essay or doing a test or making a product or doing a performance gives form and purpose to a unit of work. And it may be that the teacher then gives feedback about strengths and areas to work on. But the timing of such set pieces should be determined by the inner logic of the curriculum and not shoehorned into a reporting schedule. And they may not be necessary at all. Some subjects by their very nature need to be shared with an audience. Rather than trying to grade performance in art or music or drama, have events that showcase the work of all that parents are invited to. As well as celebrating achievement, this should give parents the opportunity to see a range of work and make their own conclusions about well their child is doing compared to their peers.
There is one metalanguage that could potentially be used to report attainment that is portable between subjects: the language of maths. If we are trying to provide a meaningful answer to the question ‘how good is my child compared to what you would expect for a child of this age?’ then we are taking about making a comparative evaluation. Where they exist, standardised assessments can be used. These allow parents to understand not just how their chid is doing in comparison to their class but in comparison to a national sample. There is no point in doing this though unless the assessment assesses what you have actually taught them. This sounds obvious but I’ve heard many a conversation with parents about how they got a low mark because lots of the test was on fractions, but we haven’t taught fractions yet!
For those subjects which don’t have standardised assessments and where it makes sense to do so, assessments of what has actually been taught can be marked and given a percentage score or score out of ten. There will be a range of scores with the class or year group. Where the child lies within that range can be communicated by sharing the child’s score, the year group average, and possibly the range of scores. In the same way, standardised scores – which is their raw form may not make much sense to most parents – can be reported in terms of where the child lies on the continuum from well above average to well below average.
Some reading this part may flinch here, especially for children who find learning in a subject more challenging. Yet if we want to give parents information about how well their child is doing compared to what we might typically expect, we can’t get away from the fact that some children are doing much less well than their peers. What we can do, and should do, is not let this kind of reporting dominate what we understand assessment to be. It has its place, but it is just once tool among a range. Other tools, such as those that enable responsive teaching, share information about motivation, or that equip students with tools to assess and improve their own learning, are much more likely to actually make a difference.
 Some children may face additional barriers that make it much more challenging to make improvements in one or more of these arears. Young children are not responsible for their attendance for example. Some children with SEMH need more than information to help them improve their behaviour.
 See Dylan Wiliam p35 in The ResearchED Guide to Assessment