Don’t mix the six! Thinking about assessment as six different tools with six different jobs.

 

With the very best of intentions, assessment has gone rogue.

It’s hard to imagine now, but when I started teaching in the late 80s, we didn’t really do assessment. We didn’t do much by way of marking, there were no SATS, no data drops and GCSE results didn’t get turned into league tables.  A few years later I remember being incredibly excited by the idea that school improvement should be based on actual data about what was and wasn’t going well, as opposed to an unswerving belief in triple mounting as the benchmark of best practice. That seemed such a modern, progressive idea that would really help schools focus on the right kind of things.  Around about the same time came ideas about the power of assessing for learning.  Now we would actually know, rather than just assert, what really was effective practice.  Henceforth we would teach children what they really needed to learn. A bright new future beckoned. I was an enthusiast.

The ‘father’ of sociology Max Weber talks about routinisation. All charismatic movements have to change in order to ensure their long-term survival. But in changing they must give up their definitive, charismatic qualities. Instead of exciting possibilities we get routines, policies and KPIs as the charisma is, of necessity, institutionalised.

Exciting ideas are all very well, but of course they need, to use a ghastly word ‘operationalising.’ But what happens over time is that the originally revolutionary impulse becomes so well established in systems and routines that they become more important than the original idea. Powerful ideas arise to address specific problems. Once routinised, the specific problem can get forgotten. Instead, we get unthinking adherence to a set of practises, divorced from reflection on whether or not those practises actually serve the purposes they were set up to address.

One of the things that has gone wrong with assessment is that it has morphed into a single magical big ‘thing’ that schools must do rather than a repertoire of different practices thoughtfully employed in different circumstances. The performance of assessment rituals is perceived as creating the reality of educational ‘righteousness’ – by doing certain things, like data drops and targets and marking and so on and so forth, a school becomes good, or at the very least avoids being bad. Not having some big system comes unthinkable.

But assessment is not one thing. It is not a ritual to be performed. Assessment is a tool, or rather set of tools, not an end in itself. Assessment is the process of doing something in order to find something out and then doing something as a result of having that new information. Because there are lots of different things we might want to find out, and lots of different ways we might seek to find that information, assessment cannot be one thing. The term assessment covers a range of different tools, all with different purposes. Whenever we are tempted to assess something, we should ask ourselves what is it we are trying to find out and what will be done differently as a result of having this information? If we can’t answer those two questions, we are on a hiding to nothing.

So what are these different purposes? The familiar language of formative and summative assessment – or more correctly – formative and summative inferences drawn from assessments is a helpful starting point. Formative assessment helps guide further learning; summative assessment evaluates learning at the end of a period of study by comparing it against a standard or benchmark.

But if we are to remind ourselves about the actual reasons why we might want to assess something, I think we need to expand beyond these two categories. I’ve come up with six, three that are different kinds of formative assessment and three that our summative.  By being clear about the purpose of each different type and not mixing them up we can get assessment back to being a powerful set of tools, that can be used thoughtfully where and when appropriate.

Formative assessment includes:

Diagnostic assessment which provides teachers with information which enables them to diagnose individual learning needs and plan how to help pupils make further progress.  Diagnostic assessment is mainly for teachers rather than pupils. If a pupil does not know enough about a topic, then they do not need feedback, they need more teaching.  Feedback is for the teacher so they can adapt their plans.  Trying to teach a child who does not know how to do something by giving the kind of feedback that involves writing a mini essay on their work is not only incredibly time consuming for the teacher, it is also highly unlikely to be effective. Further live teaching that addresses problem areas in subsequent lessons is going to do much more to address a learning issue than performative marking rituals.

Diagnostic assessment involves checking for understanding:

  • In the moment, during lessons, so that teachers can flex their teaching on the spot to clarify and address misconceptions.
  • After lessons, through looking at pupils’ work, in order to plan subsequent lessons to meet pupil needs.
  • At the end of units of work, in order to evaluate how successful the teaching of a particular topic has been and what might need to be improved the next time this unit is taught. An end of unit assessment of some sort is one possible way of doing this. Another might be looking through children’s books or using a pupil book study approach.
  • In the longer term, in order to check what pupils have retained over time, so that we can provide opportunities for revisiting and consolidating learning that has been forgotten.

Diagnostic assessment should not be conflated with motivational assessment or pupil self-assessment. A lot of the problems with assessment have arisen because the various kinds of formative assessment have been lumped together into one thing alongside a huge emphasis on evidencing that they have taken place.  This has led to an obsession with teachers physically leaving an evidence trail by putting their ‘mark’ on pupils’ work – in rather the same way that cats mark out their territory through leaving their scent on various trees.

Diagnostic assessment is assessment for teaching. The next two forms of formative assessment are assessment for learning.  Assessment for teaching is probably the most powerful of all forms of assessment and yet has been overlooked in favour of afl approaches selected mainly for their visibility.

Motivational assessment provides pupils (or their parents/carers) with information about what they have done well and what they can do to improve future learning. For motivational assessment to be effective in improving future learning, it must tell the pupil something that is within their power to do something about. Telling a child to ‘include more detail’ when they do not know more detail is demotivating and counterproductive. To use the familiar example from Dylan Wiliam, there is no point in telling a child to ‘be more systematic in their scientific enquires’ because if they knew how to be systematic, they would have done it in the first place.

Only where the gap between actual and desired performance is small enough for the pupil to address it with no more than a small nudge, can feedback be motivating.  On the other hand, feedback about effort, attendance, behaviour or homework could provide information that may have the potential to motivate pupils to make different choices.[1]

Pupil self-assessment: Pupil agency, resilience and independence can be built by teaching subject-specific metacognitive self-assessment strategies.  Teaching pupils about the power of retrieval practice and how they can use this to enhance their learning is a very powerful strategy and should form a central plank of each pupil’s self-assessment repertoire. Retrieval practice is not one thing. There are a range of ways of doing it. Younger pupils benefit from a degree of guided recall, whereas as children get older, more emphasis on free recall is more likely to be effective.

 Pupils should also be taught strategies for checking their own work – for example monitoring writing for transcription errors, reading written work aloud to check for sense and clarity, using inverse operations in maths to check for answers, monitoring one’s comprehension when reading and then rereading sections when one notices that what you’ve read does not make sense.  Pupils need be given time to use these tools routinely to check and improve their work.

Summative assessment includes:

Assessment for certification.  This includes exams and qualifications. Some of these – a grade 5 music exam for example, state that a certain level of performance has been achieved. Others, such as A levels and to an extent GCSEs, are rationing mechanisms to determine access to finite resources in a relatively fair way. Unfortunately, some of these assessments have been used evaluatively.  This is not what these qualifications are designed for and all sorts of unhelpful and unintended consequences fall out of using qualifications as indicators of school quality. In particular, it distorts the profession’s understanding of what assessment looks like and leads to the proliferation of GCSE-like wannbe assessments used throughout secondary schools.

Evaluative assessment enables schools to set targets and benchmark their performance against a wider cohort. Evaluative assessment can also feed into system-wide data allowing MATs, Local Authorities and the DfE to monitor and evaluate the performance of the schools’ system at an individual school and whole system level.

It is perfectly reasonable for large systems to seek to gather information about performance, as long as this is done in statistically literate ways. This generally means using standardised assessments and being aware of their inherent limitations. Just because we want to be able to ‘measure’ something, doesn’t mean it is actually possible. (Indeed, I have a lifelong commitment to eradicate the word ‘measure’ from the assessment lexicon.) Standardised assessments have a degree of error (as do all assessments – though standardised assessments at least have the advantage of knowing the likely range of this error). As a result, the inferences we are able to make from them are more reliable when talking about attainment than progress because progress scores involve the double whammy of two unreliable numbers.[2] They are also far more reliable at a cohort level than for making inferences about individuals since over and under performance by individuals will balance each other out when considering the performance of a cohort as a whole.

Since standardised assessments do not exist for many subjects, it is not possible to evaluate performance for say geography in the same way it is possible as it is for maths. Non standardised assessments that a school devises might give the school useful information – for example they could tell the school how successfully their curriculum has been learnt, but they don’t allow for reliable inferences about performance in geography beyond that school.

Given these limitations – the unreliability at individual pupil level, the unreliability inherent in evaluating progress and the unavailability of standardised assessments in most subjects, schools should think very carefully about any system for tracking pupil attainment or progress. By all means have electronic data warehouses of attainment information but be very aware of what the information within can and can’t tell you. I’d recommend reading  Dataproof Your School to make sure you are fully aware of the perils and pitfalls involved in seeking to make inferences from data.

What is more, summative assessment in reading is notoriously challenging since reading comprehension tests suffer from construct-irrelevant variance. In other words, they assess things other than reading comprehension such as vocabulary and background knowledge. More reliable inferences could be made were there standardised assessments of reading fluency. However, the one contender to date that could do this – the DIBELS assessment – explicitly rules out its use to evaluate performance of institutions.

Evaluative assessment is just one type of assessment with a limited, narrow purpose. It should not become the predominant form of assessment.

Informative assessment enables schools to report information about performance relative to other pupils to parents/carers, as well as information to help older pupils make choices about the examination courses, qualifications and careers.  This is the most challenging aspect to get right when seeking to develop an assessment system that avoids the problems of previous practice. Often, schools use the same system that is used for evaluative assessment for accountability purposes. But evaluative assessment is most reliable when talking about large groups of pupils, not individuals, so  where schools share standardised scores, they need to caveat this with an explanation about the limits of accuracy.

Let’s ask ourselves, what it is that parents what to find out about their child?

Most parents what to know

  • Is my child happy?
  • Is my child trying hard?
  • How good are they compared to what you would expect for a child of this age?
  • What can I do to help them?

However, parents do not necessarily want to have the answer to all of these questions in all subjects all of the time.

The first question is obviously important and schools will have a variety of ways of finding this out. It is probably most pressing when a child starts at a school. For example, it would be an odd secondary school that didn’t seek to find out if their new year 7s had settled in well at some point during the autumn term.

The second question involves motivational assessment.  Schools sometimes have systems of effort grades. These can work well where the school has worked hard with staff to agree narrative descriptors of what good effort actually involves and what it means to improve effort. For example, as well as attendance and punctuality, this could include the extent to which pupils

  • Monitor their own learning for understanding and ask for help when unsure or stuck
  • Contribute to paired or group tasks
  • Show curiosity
  • The attitude to homework
  • Work independently

Thus they create a metalanguage that allows a shared understanding of what it means for a child to work effortfully. This can then be shared with pupils and parents.  This metalanguage is portable between subjects. To a large degree, to work effortfully in Spanish involves the same behaviours as working effortfully in art. The metalanguage provides a short cut to describe what those behaviours are and where necessary how they could be further built upon. If there is a disparity between subjects, it allows for meaningful conversation about what is it specifically that the child isn’t doing in a particular subject that they could address.

If this work developing a shared understanding work does not take place and individual teachers are just asked to rate a child on a 4-point scale, then inevitably some teachers will grade children more harshly than others. I am sure I am not the only parent who has interrogated their child as to why their effort is only 3 in geography, yet it is 4 in everything else? When maybe the geography teacher reserves 4 for truly exceptional behaviour whereas the others score 4 for generally fine?

But it’s the third question that is really challenging. Schools sometimes avoid this altogether and talk about effort and what wonderful progress a child had made which is all well and good but can go horribly wrong if no one has ever had an honest conversation with parents about how their child’s performance compares with what is typical. It shouldn’t come as a surprise to parents if their child gets 2s and 3s at GCSEs for example. This might represent significant achievement and brilliant progress but parents should be aware that relatively speaking their child is finding learning in this subject more challenging than many of their peers.

However many schools often go to the other extreme and give parents all sorts of numerical information that purports to report with impressive accuracy how their child is doing. The problem being this accuracy is not only entirely spurious but rests on teachers spending valuable curriculum time on assessment activities and then even more valuable leisure time marking these assessments. And why? Just so that parents can be served up some sort of grade or level at regular intervals.

Grades or levels are important for qualifications because they represent a shared metalanguage, a shared currency that opens – or closes – doors to further study or jobs. Pandemics aside, considerable statistical modelling goes into to making sure grades have at least some sort of consistency between years. Schools however do not need to try to generate assessments that can then be translated into some kind of metalanguage that is translatable across subjects.  The earlier example of effort worked because effort is portable and comparable. It is possible to describe the effort a child habitually makes in Spanish and in DT and be talking about the same observable behaviours. This is not the same for attainment. There isn’t some generic, context-free thing called standards of attainment that can be applied from subject to subject.  We can measure length in a variety of different contexts because we have an absolute measure of a metre against which all other meters can be compared. There isn’t an absolute standard grade 4 in a vault at Ofqual. Indeed, some subjects, such as maths, assess in terms of difficulty whereas others, such as English, assess in terms of quality. Even within the same subject it is not straightforward to compare standards in one topic with another. Attainment in athletics might not bear any relating to attainment in swimming or dance for example, let alone meaning the same sort of standard of attainment in physics.  So even if it were desirable for schools to communicate attainment to parents via a metalanguage, it wouldn’t actually communicate anything of any worth.

 Yet in many schools the feeling persists that unless there is a conditionally formatted spreadsheet somewhere, learning cannot be said to have taken place. Learning is not real until it has been codifed and logged.  But schools are not grade farms that exist to grow crops of assessment data.  What we teach children is inherently meaningful and does not acquire worth or value through being assessed and labelled, let alone assessed and labelled in a self-deceiving, spurious way.

But if we do not have a metalanguage of some sort, how can we communicate to parents how well their child is doing?

First of all, the idea that telling parents that their child is working at ‘developing plus’, at a grade 3 or whatever other language we use is helpful because it uses a shared language is fanciful. The vast majority of parents will have not idea whether a grade 3 or developing plus or whatever is any good.  Even if they do, we are very likely misleading parents by purporting to share information with an accuracy that it just can’t have. If we tell parents that their child is grade 3c in RE but grade 3b in science, does that actually mean their RE is weaker than their science? If in the next science assessment the child gets a 3c, have they actually regressed? Do they really know less they than they did previously? And in any case, is a 3b good, bad or indifferent?

Nor is the use of metalanguage particularly useful for teachers. What helps teachers teach better is knowing the granular detail of what a child can and can’t do. Translating performance into a metalanguage by averaging everything out removes exactly the detail that makes assessment useful. Teachers waste time translating granular assessment information into their school’s metalanguage then meeting with leaders who want to know why such and such a child is flagging as behind. They then having to translate back from the metalanguage into the granular to explain what the problem areas are.  All this just because conditionally formatted spreadsheets give an illusion of rigour and dispassionate analysis.  

While most parents will probably want to know how well their child is doing relative to what might be typical for a child of their age, this does not mean parents want this information for every subject every term. Secondary schools in particular seem to have been sucked into a loop of telling parents every term about attainment in every subject. Not only is this not necessary, it also actively undermines standards in subjects with lesser teaching time. Take music for example. A child might get 1 lesson a week in music and 4 lessons a week in maths. If both music and maths have to summatively assess children at the same frequency, then a disproportionate amount of time that could be used for teaching music will be used instead to assess it.

Instead, school could have a reporting rota system. For example, in a secondary school context it might look something like this:

October year 7: information about how the child is settling.

Effort descriptors for 4 subjects

December year 7: attainment information for English, maths and history

Effort descriptors for 4 other subjects

Music concert

March year 7: attainment information for science, geography and languages

Effort descriptors for 4 other subjects

Art and DT exhibition.

July year 7: attainment information for RE and computing, plus English and maths standardised scores

Effort descriptors for all subjects

with a similar pattern in year 8 and year 9, though with information for all subjects coming earlier in the year for year 9 to inform children making their options.

This reduces workload and allows teaching time to focus on teaching rather than generating assessments to feed a hungry data system.  It does not mean that teaches can’t round off a topic with a final task that brings together various strands that have been taught over a series of lessons if this would enhance learning. It makes this a professional decision. It may be that writing an essay or doing a test or making a product or doing a performance gives form and purpose to a unit of work. And it may be that the teacher then gives feedback about strengths and areas to work on. But the timing of such set pieces should be determined by the inner logic of the curriculum and not shoehorned into a reporting schedule. And they may not be necessary at all. Some subjects by their very nature need to be shared with an audience. Rather than trying to grade performance in art or music or drama, have events that showcase the work of all that parents are invited to. As well as celebrating achievement, this should give parents the opportunity to see a range of work and make their own conclusions about well their child is doing compared to their peers.

There is one metalanguage that could potentially be used to report attainment that is portable between subjects: the language of maths. If we are trying to provide a meaningful answer to the question ‘how good is my child compared to what you would expect for a child of this age?’ then we are taking about making a comparative evaluation. Where they exist, standardised assessments can be used. These allow parents to understand not just how their chid is doing in comparison to their class but in comparison to a national sample.  There is no point in doing this though unless the assessment assesses what you have actually taught them. This sounds obvious but I’ve heard many a conversation with parents about how they got a low mark because lots of the test was on fractions, but we haven’t taught fractions yet!

For those subjects which don’t have standardised assessments and where it makes sense to do so, assessments of what has actually been taught can be marked and given a percentage score or score out of ten. There will be a range of scores with the class or year group. Where the child lies within that range can be communicated by sharing the child’s score, the year group average, and possibly the range of scores. In the same way, standardised scores – which is their raw form may not make much sense to most parents – can be reported in terms of where the child lies on the continuum from well above average to well below average.

Some reading this part may flinch here, especially for children who find learning in a subject more challenging.  Yet if we want to give parents information about how well their child is doing compared to what we might typically expect, we can’t get away from the fact that some children are doing much less well than their peers. What we can do, and should do, is not let this kind of reporting dominate what we understand assessment to be. It has its place, but it is just once tool among a range. Other tools, such as those that  enable responsive teaching, share information about motivation, or that equip students with tools to assess and improve their own learning, are much more likely to actually make a difference.


[1] Some children may face additional barriers that make it much more challenging to make improvements in one or more of these arears. Young children are not responsible for their attendance for example. Some children with SEMH need more than information to help them improve their behaviour.

[2] See Dylan Wiliam p35 in The ResearchED Guide to Assessment

Advertisement
 Don’t mix the six! Thinking about assessment as six different tools with six different jobs.

The highs and lows of knowledge organisers: an end of year report

In January, after one term of us using knowledge organisers, I posted this blog about how our experiment with them was going. 6 months later, the academic year over, I thought it might be useful to share my reflections upon what we’ve learnt along the way.  Since January, the importance of schools taking a good, long look at the curriculum they offer has really come to the fore, thanks to those trend setters down at Ofsted Towers. Amanda Spielman’s talk at the Festival of Education underlined what Sean Harford has been talking (and tweeting) about all year – stop obsessing about data (sort of) and the inevitable narrow focus on English and maths that necessitates[1], the curriculum is where it is at these days guys. So there is a lot of waking up and smelling the coffee going on as we begin to realise just how iconoclastic this message really is.  The ramifications are huge and startling. It’s a bit like the emperor with no clothes suddenly berates us for our poor fashion sense. We feel indignant (the data nonsense was Ofsted driven after all), pleased (we always wanted a broader curriculum), terrified (are asking to have their cake and eat it – schools side-lined the rest of the curriculum for a reason and not on a whim – how possible is it to really go for quality in the other subjects when getting good sats /gcse results is still such a monumental struggle?) and woefully ill-prepared.

I’m going to focus on the ‘pleased’ bit. It’s not that I don’t share the indignation and the terror. The indignation we will just have to get over. A broader curriculum will only happen if Ofsted want a broader curriculum – such is the power they wield – so let’s try and move on from the exasperation we feel when the curriculum poachers turn curriculum gamekeepers. As for the terror, let’s keep on letting Amanda and Sean know why we are so scared. I wrote another blog a while back about the triple constraint – the idea (from engineering project management) that the three variables of time, cost and scope (a term which embraces both quality and performance specification) are constrained by one another.  If you wish to increase the scope of a project by wanting quality in a broader range of areas than previously, then that will inevitably either cost you more time or more money. Time in education is relatively inelastic.  We can’t just deliver the ‘project’ later.  We can’t say we will get high standards across all areas of the curriculum by doing our GCSE’s when the ‘children’ are 20 (though this school did try something along those lines. It didn’t end well.)  So that leaves spending more on our project as the only other option. Mmmm, few problems with that.

But I digress. Back to being pleased. I am really pleased. After all, we started on revamping our ‘afternoon’ subjects well before Ofsted started banging on about this. We did so not because of Ofsted but because a) developments from cognitive science make a very strong case for ensuring children are explicitly taught knowledge if they are to become critical thinkers and creative problem solvers and b) children are entitled to a knowledge-rich curriculum.  I have become convinced of the moral duty to provide our children with a curriculum that ensures that they get their fair share of the rich cultural inheritance our nation and our world affords, an inheritance hitherto seen as the birth right of the rich and not the poor.

By sharing our experience so far, I hope I can save other schools some time (that precious commodity) by helping them avoid making the mistakes we did when we rolled out knowledge organisers and multiple choice quizzes last September.

A quick recap about what we did. We focused on what I am going to call ‘the big four’ i.e. the 4 ‘foundation’[2] subjects: history, geography, RE and science.  In July 2016 I shared some knowledge organisers from other schools with the staff – almost all from secondary schools as I could only find one example from a primary school at that point. Staff then attempted to write their own for these 4 subjects for the coming academic year.  It seemed to me at the time that this would be a relatively straight forward thing to do. I was wrong but more of that later. Our afternoon curriculum had been timetables into 3 week blocks, with strict cut offs one the 3 weeks had elapsed. This worked extremely well. It tightened planning – much less faff – much more deciding up front what really mattered, hitting the ground running with specific coverage in mind. It gave an excitement to the learning. Neither the children nor the teacher got bored by a topic that drifted on and on, just because that half term was quite long. It also meant that subjects did not fall off the edge of the school year never taught because people had run out of time. I would highly recommend this way of structuring the delivery of most of the foundation subjects. Obviously it doesn’t work for PE (though a good case can be made for doing it in swimming), MFL or PHSE, which need to be done at least weekly, but that still leaves at least 3 afternoons for the other stuff.

The weekend before each block started, the children took home the knowledge organiser for the new block.  The idea being that they read the KO, with their parents help where necessary. Then on Monday, the teacher started to teach them the content, some of which some of them would have already read about at the weekend. The next weekend, the KO’s went home again, along with a multiple choice quiz based on it, the answers to which were all (in theory) in the KO. These didn’t have to be given in and the scores were not recorded, although in some classes children stuck the KO and each quiz in a homework book.  The same procedure was repeated on the second weekend of the block. Then on the final Friday of each block, a multiple choice quiz was done and marked in class. The teacher took notice of the scores but we didn’t track them on anything. This is something we are changing this September with a very simple excel spreadsheet to record just the final end of unit quiz score.

Since we didn’t have KO’s for computing, art or DT, I suggested that during these curriculum blocks, children should take home the KO from a previous block and revise that and then do a quiz on it at the end of the art (or whatever) block. The ideas being that by retrieving the knowledge at some distance from when it was originally taught, the testing effect would result in better long term recall.  However, as it was a suggestion and I didn’t really explain about the testing effect and teachers are busy and the curriculum over full, it just didn’t happen. From this September, I’ve explicitly specified what needs to be revisited when in our curriculum map. Towards the end of last year, I also gave over some staff meeting and SMT time to studying cognitive psychology and this will continue next term with the revamp of our teaching and learning policy which is being rewritten with the best insights from cognitive science explicitly in mind.

Then, in the dying days of term, in mid July, the children took an end of year quiz in each of the 4 subjects which mixed up questions from all the topics they had studied that year. In the two weeks prior to this, children had revised from a mega KO, in effect a compilation of all previous KO’s and quizzes that year. They had revised this in lessons (particularly helpful at the end of term when normal service in interrupted by special events, hand over meetings and so forth) and at the weekend for homework. It hadn’t really been my intention to do this at the start of the year, but I confess to being a bit spooked by Ofsted reports that had (the lack of) assessment in the foundation subjects down as a key issue, something I wrote about here.  But having done so, I think it is a good idea. For one, it gives the children another chance to revisit stuff they’ve learnt several months previously, so improving the likelihood that they will be able to recall this information in the longer term.  Secondly, it gives these subjects status. We did the tests after our reports were written and parents meetings held. Next year I want to get the end of year scores (just a simple mark out of 10 or 15) on reports and shared with parents.  The results from the end of year tests were interesting. In the main, almost all children did very well. Here are the results, expressed as average class percentages. I’m not going to tell you which year group is which as my teachers might rightly feel a bit perturbed about this, so I’ve mixed up the order here, but it represents year groups 2-6.

History RE Science Geography
86% 93% 85% 84%
79% 85% 91% 82%
83% 95% 87% n/a
75% 75% 67% 74%
70% 76% 66% n/a

One class was still studying their geography block when we took the tests and another did Ancient Egypt as mixed geography/history block, geography coming off somewhat the worse in this partnership, something I may not have noticed without this analysis, and which we are now changing for next year.

From this I notice that we seem to be doing something right in RE and that by contrast, science isn’t as strong.  The tests threw up some common errors; for example, children confusing evaporation and condensation, something we can make sure we work on. Looking at the class with the lowest results, it is striking that the average is depressed by a few children scoring really badly (4 out of 10, 5 out of 15) but these are not the children with SEN but generally children with whom we already have concerns about their attitude to learning.  All the more reason to share these results with their parents.

Even so, the lowest score here is 66%, and that is without doing any recap once the block has finished until the very end of the year, something we will do next year.  I don’t have anything to compare these results with but my gut instinct is that in previous years, children would be hard pressed to remember 2/3’s of what they had learnt that year, let alone remembering 95% of it. As Kirschner and co remind us, if nothing has  been changed in the long term memory, nothing has been learned.[3] Or as Joe Kirby puts it ‘learning is remembering in disguise.’  So next year, I’d like us to aim for average around the 90% mark – mainly achieved by going back over tricky or easily confused content and by keeping a close eye on the usual suspects. Are they actually doing their revision at home?

So, after that lengthy preamble, what are the main pitfalls when using KO’s and MCQ’s for the first time.

  1. Deciding which knowledge makes it onto a KO is hard, particularly in history and sometimes RE. One teacher did a KO on Buddhism that had enough information for a degree! In general, the less you know about something, the harder it is to make judicious choices because you simply do not know what is and isn’t really important. In science it is pretty easy, go to BBC bitesize for the relevant topic and use that. For history you actually have to decide how to cut a vast topic down to size. Who will do this deciding? The class teacher, the subject co-ordinator, the SLT or the head teacher? For what it’s worth I’d start with the class teacher so they own the learning, but make sure that is scrutinised by someone else, someone who understands what is at stake here[4]. Quite a few primary schools have developed KO’s this year, so look at these and adapt from there, rather than starting from scratch. I’m going to put ours on @Mr_P_Hillips one  https://padlet.com/jack_helen12/czfxn9ft6n8o once I’ve removed any copyright infringing images. It’s one thing using these images on something just used in one school, quite another putting these up on the web. There are some up already by other people, so do take a look. I definitely think this hive-mind approach ton developing KO’s at primary level is the way ahead.  We are unlikely to have subject specialists for all the subjects in the curriculum in our individual schools, let alone ones who are up to date with the latest debates about makes for a good curriculum. However, by combining forces across the edu-twittersphere, I’m sure we can learn from each other, refining each other’s early attempts until we get something we know is really good. We’ve revised ours twice this year, once in January after a term of writing ones that were too long and then again in July with the benefit of hindsight
  2. Seems obvious but…if you are using quizzes, make sure the answers are in the KO! Someone – a secondary school teacher I think – tweeted a while back that KO’s are only KO’s if they can help children self-quiz. I think he was alluding to the grid sort of KO that looks like this (here’s an extract)
When did the ancient Greeks live? about 3,000 years ago
When was Greek civilisation was most powerful Between 800 BC and 146 BC.
Ancient Greece was not a single country but was made up of many city states
Some examples of city states are Athens, Spartan and Corinth
City states used to fight each other a lot. But if enemies not from Greece attacked they all joined together to fight back
The first city states started About 800 BC
All Greeks Spoke the same language and worshipped the same gods.
Ancient Greece is sometimes called the ‘cradle of Western civilisation’
Cradle of Western civilisation means The place where European culture all started
The climate in Greece is Warm and dry
In ancient Greece most people earned their living by Farming, fishing and trade
The two most powerful city states were Athens and Sparta

 

As opposed to the same information presented as continuous prose like this.

The ancient Greeks lived about 3,000 years ago

Greek civilisation was most powerful between 800 BC and 146 BC.

Ancient Greece was not a single country but was made up of many city states such as Athens, Spartan and Corinth; but all Greeks spoke the same language and worshipped the same gods.

City states used to fight each other a lot. But if enemies who were not from Greece attacked, they all joined together to fight back.

Ancient Greece has been called ‘the cradle of Western civilisation’ because writing, art, science, politics, philosophy and architecture in Europe all developed from Greek culture.

Ancient Greece had a warm, dry climate, as Greece does today. Most people lived by farming, fishing and trade

The idea with the grid being that children cover one half and write the answers (or questions) as a way of revising.  I get this for secondary children but it doesn’t seem suitable for primary aged children – especially the younger ones. The grid is just too forbidding to read. And we don’t expect them to write out answers for homework to check themselves. Again for younger children that would turn it into such as chore rather something we have found our children actually like doing.  Maybe we might develop a grid alongside the continuous prose? (I did both for Ancient Greece to see which worked better, but went for the prose version in the end).  Maybe for years 5 and 6 only?

When we audited the KO’s against the quizzes we found that the quizzes sometimes asked questions that weren’t on the KO! We spend a couple of staff meetings putting that right so I think that’s all sorted now, but if you spot any omissions when I finally do post our KO’s and quizzes, do let me know. Keep thinking hive mind.

  1. If you think KO’s are hard to write, wait until you try to write quizzes! The key to a good mcq is that the other answers – the distractors as they are known in the trade, are suitably plausible. Maybe some of our high scores were down to implausible distractors? However a really good distractor can help you spot misconceptions so are really useful formatively.

Polar explores (year 4,  joint history/geography topic)

Question Answer A Answer B Answer C
Which one of these is NOT a continent? North America Europe Russia
Which on of these is NOT  a country? Argentina Africa Hungary
Pemmican is… an animal that lives in water and has wings. high energy food made of meat and fat. high energy food made out of fish and protein.
Great Britain is surrounded by water so it is an.. island Ireland continent
If you travel north east from the U.K you will reach… Norway Belgium Austria
Shackleton’s ship was called… The Antarctica The Elephant The Endurance
When did Henson and Peary make a mad dash for the North Pole? 1909 1609 1979

 

I think this example has good distractors. I particularly like the way the common misconception that Africa is a country is addressed. With the dates, you may argue that children are using deduction rather than recall. I don’t think at this point that is a problem. Besides the fact that by having to think about the question their recall will have been strengthened anyway, we all know hard it is for children to develop a sense of time. 2009 was the year many of year 4 were born so if they think that happened a mere 40 years before they were born – when possibly their teacher was already alive, then we know their sense of chronology is still way out. But I would hope that most children would automatically dismiss this date and then be faced with a choice between 1609 and 1909. Some will just remember 1909 of course. But others might reason that since that 1609 is a really long time ago before the Fire of London whereas 1909 is only just over 100 years ago and appreciate that while the story is set in the past, it’s not that long ago and the technology needed to make the voyage far outstripped that around even in 1666. On the other hand, if the can reason that well about history they probably already know it was 1909! When at primary level we try to get children to remember dates, it is in order to build up their internal time line and relate events relative to one another. By the time children study this in year 4, they have previously learnt about the Magna Carta, Fire of London, the Crimean War and World War 1 (yr 2 ‘nurses’ topic on Florence Nightingale, Mary Seacole and Edith Cavell), the Stone Age, The Iron Age, Ancient Egypt, the Romans, the Anglo Saxons and the Vikings as well as knowing that Jesus was born 2017 years ago (and hopefully beginning to understand BC and why the numbers go backwards). I would hope they would be able to group these into a sequence that was roughly accurate – that’s something else we should develop some assessments for. Elizabeth Carr and Christine Counsell explored this with ks3 children; I’m going to adapt it for ks2 next year.

  1. I had hoped to bring all the KO’s and quizzes together into a nicely printed and bound book ready for revision before the final end of year assessments. In fact, ideally this booklet would be ready at the start of next year, so that children could revise from it at spare moments –not only at home and during specific revision lessons, but also when they had a supply teacher for example (for part of the day) , or in those odd 20 minute slots you sometimes get after a workshop has finished or before it starts. I wanted it to be properly printed and spiral bound to look ‘posh’ and look important. However, I really underestimated how much paper all this generates. There was I worrying we weren’t covering enough content – when we gathered it all together it took up 36.4MB. The price for getting a hard copy printed for each child (for their year group only) came to over £1500 – well beyond our budget. So a member of the admin team spent a whole day photocopying everything. By copying stuff back to back we were able to make it slim enough for the photocopier to staple. These were then put into those A4 see-through plastic pouches – we call them ‘slippery fish’ at our school.  They didn’t have anywhere near the gravitas that I’d hoped for – stapled at one corner only with pages inevitably tearing off. The teachers didn’t let them home until the final weekend because they were scared they would get lost. So much for the lovely idea that we would present leavers with a bound copy of all the KO’s and quizzes they had since year 2. So unless you have a friendly parent in the printing business or can get someone to sponsor you – be prepared for a low tech, photocopier intensive solution. In hindsight if every class had had a homework book the KO’s and quizzes went into as we went along, that would have been problem solved.

So there we have it. The top tip is to learn from what is already out there, adapting and honing what others have already done. Then please share back.

[1] I’m talking from a primary perspective here. The message to secondary schools being similar, but more along the lines of ‘forget your PiXL box of magic tricks and start making sure your kids are really learning important stuff.’

[2] Yes, I know, officially RE and science are ‘core’ subjects. They are not really though, in practice, are they. That’s partly what Amanda and Sean are getting at

[3] Kirschner A., Sweller J. and Clark E., 2006. Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching, Educational Psychologist, 41(2), p77

[4] I had intended to write about what is at stake in this blog but its long enough already. Another time, maybe. I do talk about the issues in my intial blog on KO’s mentioned at the start though, if you are looking for help .

The highs and lows of knowledge organisers: an end of year report

Test to the Teach

making-good-progressWhen Daisy Christodoulou told us not to teach to the test, I assumed she was mainly concerned with teachers spending too much lesson time making sure children understood the intricacies of the mark scheme at the expense of the intricacies of the subject. Personally, I’ve never spent that much time on the intricacies of any mark scheme. I’ve been far too busy making sure children grasp the rudimentary basics of how tests work to have time spare for anything intricate.   For example, how important it is to actually read the question.  I spend whole lessons stressing ‘if the question says underline two words that mean the same as …., that means you underline TWO words. Not one word, not three words, not two phrases. TWO WORDS.    Or if the questions says ‘tick the best answer’ then,  and yes, I know this is tricky, the marker is looking to see if you can select the BEST answer from a selection which will have been deliberately chosen to include a couple that are half right. BUT NOT THE BEST. (I need to lie down in a darkened room just thinking about it).

But this is not Christodoulou’s primary concern.

Christodoulou’s primary concern is that the way we test warps how we teach. While she is well aware that the English education system’s mania for holding us accountable distorts past and present assessment systems into uselessness, her over-riding concern is one of teaching methodology.  She contrasts the direct teaching of generic skills (such as using inference for example) with a methodology that believes such skills are better taught indirectly through teaching a range of more basic constituent things first, and getting those solid.  This approach, she argues, creates the fertile soil in which  inferring (or problem solving or  communicating or critical thinking or whatever) can thrive. It is a sort of ‘look after the pennies and the pounds will look after themselves’ or (to vary the metaphor) a ‘rising tide raises all boats’ methodology. Let me try to explain…

I did not come easily to driving. Even steering – surely the easiest part of the business – came to me slowly, after much deliberate practice in ‘not hitting anything.’ If my instructor had been in the business of sharing learning objectives she would surely have told me that ‘today we are learning to not hit anything.’

Luckily for the other inhabitants of Hackney, she scaffolded my learning by only letting me behind the wheel once we were safely on the deserted network of roads down by the old peanut factory. The car was also dual control, so she pretty much covered the whole gears and clutch business whilst I concentrated hard on not hitting anything.  Occasionally she would lean across and yank the steering wheel too.  However, thanks to her formative feedback (screams, yanks, the occasional extempore prayer), I eventually mastered both gears and not-hitting-anything.  Only at that point did we actually go on any big roads or ‘ to play with the traffic’ as she put it.  My instructor did not believe that the best way to get me to improve my driving was by driving. Daisy Christodoulou would approve.

Actually there was this book we were meant to complete at the end of each lesson. Michelle (my instructor) mostly ignored this, but occasionally she would write something – maybe the British School of Motoring does book looks –  such as ‘improve clutch control,’ knowing full well the futility of this –  if I  actually knew how to control a clutch I bloody well would. She assessed that what I needed was lots and lots of safe practice of clutch control with nothing else to focus on. So most lessons (early on anyway) were spent well away from other traffic, trying to change gears without stalling, jumping or screeching, with in-the-moment verbal feedback guiding me. And slowly I got better. If Michelle had had to account for my progress towards passing my driving test, she would have been in trouble. Whole areas  of the curriculum such as overtaking, turning right at a junction and keeping the correct distance between vehicles were not even attempted until after many months of lessons had taken place. Since we did not do (until right near the very end) mock versions of the driving test, she was not able to show her managers a nice linear graph showing what percentage of the test I had, and had not yet mastered.  I would not have been ‘on track’. Did Michelle adapt my learning to fit in with these assessments?  Of course not!  She stuck with clutch control until I’d really got it and left ‘real driving’ to the future- even though this made it look like I was (literally) going nowhere, fast.  Instead Michelle just kept on making sure I mastered all the basics and gradually added in other elements as she thought I was ready for them.  In the end, with the exception of parallel parking, I could do everything just about well enough. I passed on my third occasion.

I hope this extended metaphor helps explain Christodoulou’s critique of teaching and assessment practices in England today. Christodoulou’s book ‘Making Good Progress?’ explores why it is that the assessment revolution failed to transform English education. After all, the approach was rooted in solid research and was embraced by both government and the profession. What could possibly go wrong?

One thing that went wrong, explains Christodoulou, is that instead of  teachers ‘using evidence of student learning to adapt…teaching…to meet student needs’[1], teachers adapted their teaching to meet the needs of their (summative) assessments. Instead of assessment for learning we got learning for assessment.

Obviously assessments don’t actually have needs themselves. But the consumers of assessment – and I use the word advisedly –  do.  There exist among us voracious and insatiable accountability monsters, who need feeding at regular intervals with copious bucketfuls of freshly churned data.  Imagine the British School of Motoring held pupil progress meetings with their instructors. Michelle might have felt vulnerable that her pupil was stuck at such an early stage and have looked at the driving curriculum and seen if there were some quick wins she could get ticked off before the next data drop.  Preferably anything that doesn’t require you to drive smoothly in a straight line…signalling for example.

But this wasn’t even the main thing that went wrong. Or rather, something was already wrong, that no amount of AfL could put right. We were trying to teach skills like inference directly, when, in fact, these, so Christodoulou argues, are best learnt more indirectly by learning other things first. Instead of learning to read books by reading books, one should start with  technical details like phonics. Instead of starting with maths problem solving, one should learn some basic number facts. Christodoulou describes how what is deliberately practised – the technical detail –  may look very different from the final skill in its full glory. Phonics practice isn’t the same as reading a book.  Learning dates off by heart is not the same as writing a history essay.  Yet the former is necessary, if not sufficient basis for the latter. To use my driving metaphor, practising an emergency stop on a deserted road at 10mph when you know it’s coming is very, very different from actually having to screech to a stop from 40mph on a rainy day in real life, when a child runs out across the road. Yet the former helped you negotiate the latter.

The driving test has two main parts; technical control of the vehicle and behaviour in traffic (a.k.a. playing with the traffic). It is abundantly clear that to play with the traffic safely, the learner must have mastered a certain amount of technical control of the vehicle first. Imagine Michelle had adopted the generic  driving skill approach and assumed  these technical matters could be picked up en route, in the course of generally driving about,  and assumed that I could negotiate left and right turns at the same time as maintaining control of the vehicle. When I repeatedly stall, because the concentration it take to both brake and steer distracts me from concentrating on changing gears to match this slower speed, Michelle tells me that I did not change down quickly enough, which I find incredibly frustrating because I know I’ve got a gears problems, and it is my gears problem I need help with. But what I don’t get with the generic skill approach is time to practice changing gears up and down as a discrete skill. That would be frowned on as being ‘decontextualised’. I might protest that I’d feel a lot safer doing a bit of decontextualized practice right now – but drill and practice  is frowned upon – isn’t real driving after all – and in the actual test I am going to have to change gears and steer and brake all at the same time (and not hit anything) so better get used to it now.

Christodoulou argues that the direct teaching of generic skills  leads to the kind of assessment practice that puts the cart, if not before the horse, then parallel with it. Under this approach, if you want the final fruit of a course of study to be an essay on the causes of the First World War, the route map to this end point will punctuated with ‘mini-me’ variations of this final goal; shorter versions of the essay perhaps. These shorter versions are then used by the teacher formatively, to give the learner feedback about the relative strengths and weaknesses of these preliminary attempts. All the learner then has to do, in theory, is marshal all this feedback together, address any shortcomings whilst retaining, and possibly augmenting, any strengths. However, this often leaves the learner none the wiser about precisely how to address their shortcomings.  Advice to ‘be more systematic’ is only useful if you understand what being systematic means in practice, and if you already know that, you probably would have done so in the first place.[2]

It is the assessment of progress through  interim assessments that strongly resemble the final exam that Christodoulou means by teaching to the test.  Not because students shouldn’t know what  format an exam is going to take and have a bit of practice on it towards the very end of  a course of study.  That’s not teaching to the test. Teaching to the test is working backwards from the final exam and then writing a curriculum punctuated by  slightly reduced versions of that exam – and then teaching each set of lessons with the next test in mind.   The teaching is shaped by the approaching test.  This is learning for assessment.  By contrast Christodoulou argues that we should just concentrate on teaching the  curriculum and that there may be a whole range of other activities to assess how this learning is going that may look nothing like the final learning outcome. These, she contends, are much better suited to helping the learner actually improve their performance. For example, the teacher might teach the students what the key events were in the build up to the first World War, and then, by way of assessment, ask students to put these in correct chronological order on a time line.  Feedback from this sort of assessment is very clear –if events are in the wrong order, the student needs to learn them in the correct order.  The teacher teaches some  small component that will form part of the final whole, and tests that discrete part. Testing to the teach, in other words, as opposed to teaching to the test.

There are obvious similarities with musicians learning scales and sports players doing specific drills – getting the fine details off pat before trying to orchestrate everything together.  David Beckham apparently used to practice free kicks from all sorts of positions outside the penalty area, until he was able to hit the top corner of the goal with his eyes shut.  This meant that in the fury and flurry of a real, live game, he was able to hit the target with satisfying frequency.  In the same way, Christodoulou advocates spending more time teaching and assessing progress in acquiring decontextualized technical skills and less time on the contextualised ‘doing everything at once’, ‘playing with the traffic’ kind of tasks that closely resemble the final exam.  Only when we do this, she argues, will assessment for learning be able to bear fruit. When the learning steps are small enough and comprehensible enough for the pupil to act on them, then and only then will afl be a lever for accelerating pupil progress.

Putting my primary practitioner hat on, applying this approach in some areas (for example reading) chimes with what we already do,  but in others (I’m thinking writing here) the approach seems verging on the heretical.  Maths deserves a whole blog to itself, so I’m going to leave that for now – whilst agreeing whole-heartedly that thorough knowledge of times tables and number bonds  (not just to ten but within ten and within  twenty   – so including  3+5 and 8+5 for example) are  absolutely  crucial. Indeed I’d go so far as to say number bonds are even more important than times table knowledge, but harder to learn and rarely properly tested. hit-the-button I’ve mentioned hit the button in a previous blog. We have now created a simple spreadsheet that logs each child’s score from year 2 to year 6  in the various categories for number bonds. Children start with make 10 and stay on this until they score 25 or more (which means 25 correct in 1 minute which I reckon equates to automatic recall.  Then then proceed through the categories in turn – with missing numbers and make 100 with lower target scores of 15.  Finally they skip the two decimals categories and go to the times table section – which has division facts as well as multiplication facts. Yes!  When they’ve got those off pat, then they can return to do the decimals and the other categories. We’ve shared this, and the spreadsheet –  with parents and some children are practising at home each night. With each game only taking one minute, it’s not hard to insist that your child plays say three rounds of this first, before relaxing.  In class, the teachers test a group each day in class, using their set of 6 ipads.  However since kindle fire’s were on sale for £34.99 recently, we’ve just bought 10 of them (the same as the cost of 1 i pad). We’ll use them for lots of other things too, of course – anything where all you really need is access to an internet browser.

When we talk about mastery, people often talk about it like it’s this elusive higher plan that the clever kids might just attain in a state of mathematical or linguistic nirvana when really what it means is that every single child in your class – unless they have some really serious learning difficulty – has automatic recall of these basic number facts and (then later) their times tables.  And can use full stops and capital letters correctly the first time they write something. And can spell every word on the year 3 and 4 word list (and year 1 & 2 as well of course).  And read fluently – at least 140 words a minute, by the time they leave year 6. And have books they love to read – having read at least a million words for pleasure in the last year (We use accelerated reader to measure this – about half of year 6 are word millionaires already this year and a quarter have read over 2 million words.) How about primary schools holding themselves accountable to their secondary schools for delivering cohorts of children who have mastered all of these (with allowances for children who have not been long at the school or who have special needs)  a bit like John Lewis is ‘Never Knowingly Undersold’, we  should aim (among other things) to ensure at the very least, all our children who possibly could, have got these basics securely under their belt.

(My teacher husband and I now pause to have an argument about what should make it to the final list.   Shouldn’t something about place value be included? Why just facts?  Shouldn’t there be something about being able to use number bonds to do something?  I’m talking about a minimum guarantee here – not specifying everything that should be in the primary curriculum. He obviously needs to read the book himself.)

Reading

My extended use of the metaphor of learning to drive to explain Christodoulou’s approach has one very obvious flaw. We usually teach classes of 30 children whereas driving lessons are normally conducted 1:1. It is all very well advocating spending as much time on the basics as is necessary before proceeding onto having to orchestrate several different skills all at the same time, but imagine the frustration the more able driver would have felt stuck in a class with me and my poor clutch control.  They would want to be out there on the open roads, driving, not stuck behind me and my kangaroo petrol.  Children arrive at our schools at various starting points. Some children pick up the sound-grapheme correspondences almost overnight; for others it takes years. I lent our phonics cards to a colleague to show here three-year-old over the weekend; by Monday he knew them all. Whereas another pupil, now in year 5, scored under 10 in both his ks1 phonic checks.  I tried him again on it recently and he has finally passed.  He is now just finishing turquoise books. In other words, he has just graduated from year 1 level reading, 4 years later.  This despite daily 1:1 practice with a very skilled adult, reading from a decodable series he adores (Project X Code), as well as recently starting on the most decontextualized reading programme ever (Toe by Toe – which again he loves) and playing SWAP. He is making steady progress – which fills him with pride – but even if his secondary school carries on with the programme[3], at this rate he won’t really be a fluent reader until year 10. I keep on hoping a snowball effect will occur and the rate of progress will dramatically increase.

Outliers aside, there is a range of ability (or prior attainment if you prefer) in every class and for something as technical as phonics, this is most easily catered for by having children in small groups, depending on their present level. We use ReadWriteInc in  the early years and ks1.  Children are assessed individually by the reading leader  for their technical ability to decode, segment and blend every half term and groups adjusted accordingly.  So that part of our reading instruction is pretty Christodoulou-compliant, as I would have thought it is in most infant classes.  But what about the juniors, or late year 2 – once the technical side is pretty sorted and teachers turn to teaching reading comprehension.  Surely, if ever  a test was created solely  for the purposes of being able to measure something, it was the reading comprehension test, with the whole of ks2 reading curriculum one massive, time wasting exercise in teaching to the test?

I am well aware of the research critiquing the idea that there are some generic comprehension skills that can be taught in a way that can be learnt from specific texts and then applied across many texts, as Daniel Willingham explores here.  Christodoulou quotes Willingham several times in her book and her critique of generic skills is obvioulsy influenced by his work. As Willingham explains, when we teach reading comprehension strategies we are  actually teaching vocabulary, noticing understanding, and connecting ideas. (My emphasis).  In order to connect ideas (which is what inference is), the reader needs to know enough about those ideas, to work out what hasn’t been said as well as what has been. Without specific knowledge, all the generic strategies in the world won’t help. As Willingham explains

Inferences matter because writers omit a good deal of what they mean. For example, take a simple sentence pair like this: “I can’t convince my boys that their beds aren’t trampolines. The building manager is pressuring us to move to the ground floor.” To understand this brief text the reader must infer that the jumping would be noisy for the downstairs neighbors, that the neighbors have complained about it, that the building manager is motivated to satisfy the neighbors, and that no one would hear the noise were the family living on the ground floor. So linking the first and second sentence is essential to meaning, but the writer has omitted the connective tissue on the assumption that the reader has the relevant knowledge about bed‐jumping and building managers. Absent that knowledge the reader might puzzle out the connection, but if that happens it will take time and mental effort.’

So what the non-comprehending reader needs is  very specific  knowledge (about what it’s like to live in a flat), not some generic skill.  It could be argued then that schools therefore should spend more time teaching specific knowledge and less time elusive and non existant generic reading skills. However, Willingham concedes that the research shows that even so, teaching reading comprehension strategies does work. How can this be, he wonders? He likens the teaching of these skills as similar to giving someone vague instructions for assembling Ikea flat pack furniture.

 ‘Put stuff together. Every so often, stop, look at it, and evaluate how it is going. It may also help to think back on other pieces of furniture you’ve built before.

This is exactly the process we go through during shared reading. On top of our daily phonics lessons we have two short lessons a week of shared reading where the class teacher models being a reader using the eric approach. In other words, we have daily technical lessons,  and twice a week  we also have a bit of ‘playing with the traffic’ or more accurately, listening to the teacher playing with the traffic and talking about what they are doing as they do it.  In our shared reading lessons, by thinking out loud about texts, the teacher makes it very explicit that texts are meant to be understood and enjoyed and not just for barking at and that  therefore we should check as we go along that we are understanding what we are reading (or looking at). If  we don’t understand something, we should stop and ask ourselves questions.   It is where the teacher articulates that missing ‘connective tissue’,  or ‘previous experience of building furniture’  to use Willingham’s Ikea metaphor, sharing new vocabulary and knowledge of the how the world works, knowledge that many of our inner city children do not have.  (Although actually for this specific instance many of them would know about noisy neighbours, bouncing on beds and the perils of so doing whilst living in flats.)

eric

For example, this picture (used in ‘eric’ link above) gives the the teacher the opportunity to share their knowledge that that sometimes the sea can get rough and that this means the waves get bigger and the wind blows strongly. Sometimes it might blow so hard that it could even blow your hat right off your head. As the waves rise and fall, the ship moves up and down and tilts first one way, and then the other. (Pictures are sometimes used for this  rather than texts so working memory is relieved from the burden of decoding).

When teaching children knowledge is extolled as the next panacea, it’s not that I don’t agree, it’s just that I reckon people really underestimate quite how basic some of the knowledge we need to impart for our younger children. I know of primary schools proudly adopting a ‘knowledge curriculum’ and teaching  two hours of history a week, with two years given over to learning about the Ancient Greeks.  I just don’t see how this will help children understand texts about noisy neighbours, or about what the sea is like (although you could do that in the course of learning about Ancient Greece if you realised children didn’t know), or, for that matter, what it is like to mill around in bewilderment.  The only kind of assessment that will help here is the teacher’s ‘ear to the ground’ minute by minute assessment – realising that -oh, some of them haven’t ever seen the sea, or been on a boat. They don’t know about waves or how windy it can be or how you rock up and down.   This is the kind of knowledge that primary teachers in disadvantaged areas need to talk about all the time.  And why we need to go on lots of trips too. But it is not something a test will pick up nor something you can measure progress gains in.  The only way to increase vocabulary is one specific word at a time. It is also why we should never worry about whether something is ‘relevant’ to the children or not. If it is too relevant, then they already know about it – the more irrelevant the better.

I don’t  entirely agree with the argument that since we can’t teach generic reading skills we should instead teach lots more geography and history since this will give  children the necessary knowledge they need to understand what they read.   We need to read and talk, talk talk about stories and their settings -not just what a mountain is but how it feels to climb a mountain or live on a mountain, how that affects your daily life, how you interact with your neighbours.  We need to read more non fiction aloud, starting in the early years.  We need to talk about emotions and body language and what the author is telling us by showing us.  A quick google will show up writers body language ‘cheat sheets’. We need to reverse engineer these and explain that if the author has their character fidgetting with darting eyes, that probably means they are feeling nervous. Some drama probably wouldn’t go amiss either.  Willingham’s trio of  teaching vocabulary, noticing understanding, and connecting ideas is a really helpful way of primary teachers thinking about what they are doing when they teach reading comprehension. What we need to assess and feedback to children is how willing they to admit they don’t understand something, to ask what a word means, to realise they must be missing some connection.  None of this is straightforwardly testable. That doesn’t mean it isn’t important.

Writing

Whereas most primary schools, to a greater or lesser degree, teach reading by at first teaching phonics, the teaching of writing is much more likely to be taught though students writing than it is through teaching a series of sub skills.  It is the idea that we ensure technical prowess before  we spend too much time on creative writing that most challenges the way we currently do things.

Of course we teach children to punctuate their sentences with capital letters and full stops right at the start of their writing development. However, patently, this instruction has limited effectiveness for many children.  They might remember when they are at the initial stages and when they only write one sentence anyway – so not so hard to remember the final full stop in that case. Where it all goes wrong is once they start writing more than one sentence, further complicated when they start writing sentences with more than one clause. I’ve often thought we underestimate how conceptually difficult it is to understand what a sentence actually is.  Learning to use speech punctuation is far easier than learning what is, and what is not, a sentence. Many times we send children back to put in their full stops, actually, they don’t really get where fulls tops really go.  On my third session doing 1:1 tuition with a year 5 boy, he finally plucked up the courage to tell me that he know he should but he just didn’t get how you knew where sentences ended.  So I abandoned what I’d planned and instead we  learnt about sentences. I told him that sentences had a person or a thing doing something, and then after those two crucial bits we might get some extra information about where or why or with whom or whatever that  belongs with the person/thing, so needs to be in the same sentence.   We analysed various sentences, underlining the person/thing in one colour, the doing something word in another colour and finally the extra information (which could be adjectives, adverbs, prepositions, the object of the sentence – the predicate minus the verb basically) in another. This was some time ago before the renaissance of grammar teaching, so it never occurred to me to use the terms ‘subject’ ‘noun’ ‘verb’ etc but I would do now. It was all done of the hoof, but after three lessons he had got it, and even better, could apply it in his own writing.

What Christodoulou is advocating is that instead of waiting until things have got so bad they need 1:1 tuition to put it right, we systematically teach sentence punctuation (and other  common problems such as verb subject agreement), giving greater priority to this than to creative writing. In other words, stop playing with the traffic before you’ve mastered sufficient technical skills to do so properly.  This goes against normal primary practice, but I can see the sense in this. If ‘practice makes permanent’ as cognitive psychology tells us (see chapter 7 of What Every teacher Needs to Know About Psychology by Didau and Rose for more on this), then the last thing we want is for children to practice again and again doing something incorrectly. But this is precisely what our current practice does. Because most of the writing we ask children to do is creative writing, children who can’t punctuate their sentences get daily practice in doing it wrong. The same goes for letter formation and spelling of high frequency common exception words. Maybe instead we need to spend far more time in the infants and into year 3 if necessary on doing drills where we punctuate text without the added burden of composing as we go. Maybe this way, working memories would not become so overburdened with thinking about what to say that the necessary technicalities went out the window. After that, we could rewrite this correctly punctuated text in correctly formed handwriting.  Some children have genuine handwriting or spelling problems and I wouldn’t want to condemn dyslexic and dyspraxic children to permanent technical practice. However if we did more technical practice in the infants  – which would mean less time for writing composition – we might spot who had a genuine problem earlier and then put in place specific programmes to help them and/or aids to get round the problem another way. After all,  not all drivers use manual transmission, some drive automatics.

Christodoulou mentions her experience of using the ‘Expressive Writing’ direct instruction programme, which I duly ordered. I have to say it evoked a visceral dislike in me; nasty cheap paper, crude line drawings,  totally decontextualised, it’s everything my primary soul eschews (and  it’s expensive to boot). However, the basic methodology is sound enough – and Christodoulou only mentions it because it is the ones she is familiar with. It is not like she’s giving it her imprimatur or anything.  I’m loathed to give my teachers more work, but  I don’t think it would be too hard to invent some exercises that are grounded in the context of something else children are learning; some sentences about Florence Nightingale or the Fire of London for example, or a punctuation-free excerpt from a well-loved story.  Even if we only did a bit more of this and a bit less of writing compositions where we expect children to orchestrate many skills all at once, we should soon see gains also in children’s creative writing. Certainly, we should insist of mastery in these core writing skills by year 3, and  where children still can’t punctuate a sentence, be totally ruthless in focusing on that until the problem is solved. And I don’t just mean that they can edit in their full stops after the fact, I mean they put them (or almost all of them in ) as they write. it needs to become an automatic process. Once it is automatic is it easy.   Otherwise we are not doing them any favours in the long term as we are just making their error more and more permanent and harder and harder to undo.

Certainly pupil progress meetings would be different. Instead of discussing percentages and averages,  the conversation would be very firmly about the teacher sharing the gaps in knowledge they had detected, the plans they had put in place to bridge those gaps, and progress to date in so doing, maybe courtesy of the ‘hit the button’ spreadsheet, some spelling tests, end of unit maths tests, records of increasing reading fluency. Already last July our end of year reports for parents shared with them which number facts, times tables and spellings (from the year word lists) their child did not yet know…with the strong suggestion that the child work on these over the summer!   We are introducing ‘check it’ mini assessments so that we can check that we we taught three weeks ago is still retained. It’s easy, we just test to the teach.

[1] Christodoulou quoting D. Wiliams, p 19 Making Good Progress?

[2] Christodoulou quoting D. Wiliams, p 20 Making Good Progress?

[3] I say this because our local secondary school told me they didn’t believe in withdrawing children from class for interventions. Not even reading interventions. Surely he could miss MFL and learn to read in English first? As a minimum.  Why not English lessons? I know he is ‘entitled’ to learn about Macbeth but at the expense of learning to read? Is Macbeth really that important? Maybe he will go to a different secondary school or they’ll change their policy.

Test to the Teach

Milling around in bewilderment: that reading comprehension.

You can see the reading test for yourself from this link

The day started well; dawn casting spun-gold threads across a rosy sky.  The long wait was over; sats week was finally here.  And it looked like summer had arrived. Year 6 tripped in to classrooms  while head teachers fumbled skittishly with secret keys in hidden cupboards.  Eventually teachers across the nation ripped open plastic packets.Perhaps at first their fears were calmed, for the text – or what you can glean about it from reading snippets here and there as you patrol the rows – didn’t seem too bad. In previous weeks children had struggled with excerpts from the Lady of Shalott, Moonfleet, Shakespeare.  The language here looked far more contemporary.

But no. Upon completion children declared the test was hard – really hard. Many hadn’t finished – including  children who usually tore through tests like a…white giraffe? What is more, the texts didn’t seem to be in any kind of order. We had drilled into them, as per the test specification guide, that the texts would increase in difficulty throughout the paper (section 6.2) Yet the middle text was almost universally found to be the hardest.  Some declared the final text the easiest. What was going on?

Tests safely dispatched, I decided to take a proper look. It didn’t take long for it to be apparent that the  texts contained demanding vocabulary, and some tortuous sentence structure. The difference with the sample test material was stark. Twitter was alive with tales on sobbing kids, and angry teachers. Someone said they had analysed the first paragraph of the first text and it came out with a reading age of 15. Debate followed; was this really true or just a rumour? Were readability tests reliable? I tweeted that it was a test of how middle class and literary one’s parents were, having identified 45 words I reckoned might challenge our inner city children.  After all, as a colleague remarked, ‘my three-year-old knows more words than some children here’. Other people drew groans by mentioning how irrelevant the texts were to the kind of lives their children lived. I seemed to be implicated in this criticism…although it’s difficult to tell who’s criticising who sometimes on Twitter. Still, I was put out. I don’t care if texts are ‘relevant’, I retorted. I cared that the vocabulary needed to answer questions  favoured a ‘posh demographic.  Apparently, this was patronising. I saw red at this point! It’s not that poorer children can’t acquire a rich vocabulary but that since it is well known that a rich vocabulary is linked to parental income and the domain ‘rich vocabulary ‘ is huge (and undefined), it is not fair or useful to use tests that rely on good vocabulary for accountability.  And then I put a link to this previous post of mine, where I’ve explained this in more depth. If accountability tests over-rely on assessing vocabulary as a proxy for assessing reading, this hands a free pass to school choc full of children like my colleague’s three-year-old, since such children arrive already stuffed to the gills with eloquence and articulacy. Whereas the poorer the intake the greater the uphill struggle to enable the acquisition of the kind of  cultural capital richer children imbibe with their mother’s milk.

Flawed as the previous reading tests were, they did not stack the cards against  schools serving language-poor populations. The trouble with using vocabulary as a measure is that it that each individual word is so specific. Usually what we teach is generalisable from one context to another. Learning words however has to be done on a case by case basis. I recently taught year 6 somnolent, distraught and clandestine, among many others. I love teaching children new words, and they love acquiring them.  But unless there is some sort of finite list against which we are to be judged, I’d rather not have our school judged by a test that is hard to pass without an expansive and sophisticated vocabulary. With the maths and SPAG tests, we know exactly what is going to be tested. The domain is finite. We worry about how to teach it so it is understood and remembered, but we do not worry that some arcane bit of maths will worm its way into the test.  Nautical miles, for example.  Not so with reading. Any word within the English language is fair game – including several that don’t regularly appear in the vocabulary of the average adult. There may be very good reasons for the government to want to ascertain the breadth of vocabulary acquisition across the nation. In which case, they could instigate a vocabulary test – maybe something along the lines of this.  But that shouldn’t be confused with the ability to read. To return to our earlier example, my colleagues three-year-old may have an impressive vocabulary but she can’t actually read much  at all yet. Whereas our 11-year-olds may not know as many words but are happily enjoying reading  the Morris Glietzman ‘Once’ series.

It is becoming accepted that  reading is not just the orchestration of a set of skills, but requires knowledge of the context for the reader to make sense of the bigger picture.  But that’s not what happened here.  It’s not the case that children found the texts difficult because they lacked knowledge of the context. The context of the first text was two children exploring outdoors. True only 50% of our present year 6 knew what a monument was at the outset – a bit tricky since this was pretty central to the test – but by the end of the story they sort of worked it out for themselves. The second text featured a  young girl disobeying her grandmother and taking risks. And a giraffe. Well I reckon this is pretty familiar territory (grandmothers and risks, I mean) and while we do not meet giraffes everyday in Bethnal Green, we know what they are.  The third and final text told us all about dodos and how they may have been unfairly maligned by Victorian scientists. So that was a bit more remote from every day experience but no so terribly outlandish as to render the text impenetrable. The third text is meant to be harder. The children are meant to have studied evolution and extinction by then in science anyway.   So it wasn’t that the Sitz im Leben was so abstruse as to render comprehension impossible. The problem was the words used within the texts and the high number of questions which were dependent upon knowing what those words meant. The  rather convoluted sentence structure in the  second text didn’t help either – but if the words had been more familiar, children might have stood more of a fighting chance.

According to the test specification, questions can be difficult in one of five different ways. These five ways are based on research commissioned by the PISA guys. It’s an interesting  and informative read – so I’m not arguing with the methodology per se.  I don’t know nearly enough to even attempt that. Amateur though I am, I do argue with the relative proportions allocated to each of the five strategies in this test.

With three of these, I have no quarrel. Firstly,  ( and my ordering is different from that in the document)  questions can be made easier or harder in terms of accessibility; how easy is it to find the information? Is the student signposted to it (e.g. see the first paragraph on page 2).   Or is the question difficulty raised by not signposting and possibly by having distractor items to lure students down dead ends?  I think we have little to complain about here. e.g. question 30 has clear signposting…’Look at the paragraph beginning:  Then, in 2005…’ whereas  in question 33  the relevant information is much harder to find – it’s a ‘match the summary of the paragraph to the order in which they occur’ question.

Secondly, questions may vary in terms of task-specific complexity. How much work does the student have to do to answer the question?  Is it a simple information retrieval task or does the pupil have to use inference?

For example, question 7 is easy in this regards.’ Write down three things you are told about the oak tree.’   The text clearly says the oak tree was ‘ancient’.  I haven’t checked the mark scheme as it’s not yet published as I write, but I am assuming that’s enough to earn you 1 mark. Whereas question 3 is a bit harder. ‘How can you tell that Maria was very keen to get to the island?  Students need to infer  this from the fact that she she said  something ‘impatiently’.   There are far fewer of this kind of question under this new regime – but we were expecting that and the sample paper demonstrated that. Again – no complaints.  Indeed the test specification does  share the relative weightings of different skills (in section 6.2.2, table 9), but the bands are so wide its all a bit meaningless. Inference questions can make up between 16% and 50% of all questions, for example.

Thirdly, the response strategy can be more or less demanding, a one word answer versus a three-marker explain your opinion question.

The two final ways to make questions more or less difficult are by either varying the extent of knowledge of vocabulary required by the question (strategy 5 in the specification document) or by varying  the complexity of the target information that is needed to answer the question.  (Strategy 2) The document goes on to explain that this means by varying

• the lexico-grammatical density of the stimulus

• the level of concreteness / abstractness of the target information

• the level of familiarity of the information needed to answer the question

and that …’There is a low level of semantic match between task wording and relevant information in the text.’

I’m not quite sure what the difference is between ‘lexico-grammatical density’  (strategy 2) and knowledge of vocabulary required by the question (strategy 5), but  the whole thrust of this piece is that texts were pretty dense lexico-grammatically and in terms of vocabulary needed to answer the questions. When compared with the sample test for example, the contrast is stark. Now I’m no expert in linguistics or test question methodology. I’m just a headteacher with an axe to grind, a weekend to waste and access to google.  But this has infuriated me enough to do a fair bit of reading around the subject.

On the Monday evening post test, twitter was alive with people quoting someone who apparently had said that the first paragraph of the first text had  a Flesch Kincaid reading ease equivalent to that of a 15 year old. I’d never heard of Flesch Kincaid – or any other of the readability tests – so I did some research and found out that indeed, the first paragraph of The Lost Queen was described as suitable for 8th-9th graders – or 13-15 year olds in the British system. But there was also criticism online that the readability tests  rated the same texts quite differently so weren’t a reliable indicator of much. (Someone put a link up to an article about this, which I foolishly forgot to bookmark and now can’t find – do share the link again if it was you or you know a good source.)*

Anyway, be that as it may, I decided to do some readability tests of various bits and pieces of the sats paper.  And this is what I discovered. (texts listed in order of alleged difficulty)

The Lost Queen first paragraph:  13-15 year olds

Wild Ride first paragraph:              13-15 year olds

Wild Ride ‘bewildered’ paragraph     18-22 year olds

Way of the Dodo first paragraph          13-15 year olds  (and lower score than The Lost Queen)

Way of the Dodo 2nd paragraph         13-15 year olds.

So there you have it, insofar as Flesch Kincaid has any reliability, the supposedly hardest text was in fact the easiest, the middle text was the hardest.

I did the same with the the sample paper. The first had a readability level of a 11-12 year-old and the second 13 – 15. I had lost the will to live by then so didn’t do the third text – but it is clearly much more demanding than the previous two  – as it should be.

I also used the automated reading index and while this gave slightly different age ranges, the relative difficulty was the same and all the texts were for children older than 11, the easiest being… the first part of the way of the dodo.

However,  it was also clear from my reading that readability tests are designed to help people writing, say pamphlets for the NHS, make the writing as transparent and easy as possible. In other words, they are intended to make reading simple so people who aren’t very good at it can understand stuff that may be very important. It struck me that maybe this wasn’t exactly what we should be aiming for in a reading assessment. After all, we do want some really challenging questions at some point. We just want them  at the end, where they are meant to be. We need readability tests because previous generations have not been taught well enough to be presented with demanding information. We want better for the children we now teach.

Which brought me to discover this site, which ranks words by their relative frequency in the English language.  If we are going to be held accountable for the sophistication of the vocabulary are children can comprehend, then surely there should be some bounds on that.  While the authority of this is contested, it seems to be generally held that the average adult knows about 20,000 words. You can test yours here.   How many words the average 11 year old does or should know I did not discover – so here are my ball park suggestions.

For the first text – the one that is meant to be easier – there should be a cap on words ranked occurring below 10,000. (I’m assuming here we understand that as words are used less frequently their ranking falls but the actual number rises: a ranking of 20,000 is lower than a ranking of 10,000. If this is not the correct convention for such matters, I apologise). Definitions should be given for low frequency words, especially if understanding them is critical to answering specific questions. In the same way in which Savannah was explained at the introduction to Wild Ride

Then in the second text words could be limited to 15,000, and in the third 20,000 – representing the average adult’s vocabulary. I have plucked these figures from the air. I would not go to the stake for them. But you get my meaning. We need to pin down the domain of ‘vocabulary’ if we are to be held accountable when it is tested.

For what it is worth, I asked our year 6 after the test to tell me which words they did not know. There are 30 children in the class. Words where half the class or more did not know the meaning included  from the first text: monument, haze, weathered (as a verb); from the second text: jockey, dam, promptly, sedately (zero children), counselled, arthritic, nasal, pranced, skittishly (zero children), milled, bewildered, spindly, momentum; from the third list haven, oasis ( they knew this was a brand of drink though), parched, receding, rehabilitate and anatomy. My Geordie partner tells me they would have known parched if they were northern because that’s Geordie for ‘I’m really thirsty.’  Here we can see again that the middle passage had the highest number of unknown words in my obviously unrepresentative sample. In fact, it was the first paragraph on page 8, which henceforth shall be known as the bewildering paragraph that seemed to have the highest lexico-grammatical density.  As the mud flats entrapped the Mauritian dodos, so did this paragraph ensnare our readers, slowing them down to the extent that they failed to finish the questions pertaining to the relatively easy  final text.

Maybe I’m wrong. Maybe when the statistics are finally in, there won’t be a starker-than-usual demarcation along class lines. I’d love to be wrong. Let’s hope I am.

And finally, what you’ve all been waiting for – what was the lowest ranking word?  Well yes of course, it was ‘skittishly‘;  so rare it doesn’t even appear in the data base of 60,000 words I was using. But suitable for 11 year olds, apparently.

In case you are interested, here’s the full rankings. Where the word might be more familiar as a different part of speech I have included a ranking for that word too, in italics. The words I chose to rank were just those my deputy and I thought children might find tricky.

Word

(organized by rank lowest to highest)

Part  of speech Ranking Text Necessary (N)/useful  (U) for question number
skittishly adverb <60,000 WR
parch adjective 46,169 WD E29
sedately adverb 38,421 WR
clack noun 32,467 TLQ 4 distractor
sedate verb 23,110
prance verb 22,360 WR
skittish adjective 21,298
sedate adjective 20,481
arthritic adjective 20,107 WR
mossy adjective 19,480 TLQ U8 distractor

E9

misjudge verb 19140 WD
spindly adjective 19025 WR
bewilderment adverb 17,410 WR E16
squeal noun 17,103 WR
sternly adverb 16,117 WR
plod verb 16,053 WR
prey verb 15,771 WD
dismount verb 15,601 WR
hush noun 15,394 TLQ N4
burrow noun 14,900 WR
hush verb 14,295
nocturnal adjective 13,755 WR E14
enraged adjective 13,378 WR
mill verb 13,378 WR E16
sprint noun 12,187 WR
squeal verb 12036
folklore noun 11,722 WD
rehabilitate verb 11,496 WD E31
oasis noun 10,567 WD U29
stern adjective 10,377
moss noun 10142
evade verb 9759 WR
sight verb 9730 WD
jockey noun 9723 WR
weather verb 9568 TLQ U8
blur noun 9319 WR
murky adjective 9265 TLQ U6
inscription noun 9164 TLQ U8
rein noun 8793 WR
sprint verb 8742
slaughter noun 8494 WD
anatomy noun 8310 WD E32
haze noun 8,307 TLQ 4 distractor
counsel verb 7905 WR
nasal adjective 7857 WR
recede verb 7809 WD
intent adjective 7747 WR
stubborn adjective 7680 WR question E15
slab noun 7585 TLQ U8
arthritis noun 7,498 WR
promptly adverb 6762 WR
blur verb 6451
haven noun 5770 WD
vine noun 5746 TLQ U6
defy verb 5648 WR E15
startle verb 5517 WR
drought noun 5413 WD
remains noun 5375 WD
devastating adjective 4885 WD
rehabilitation noun 4842
prey noun 4533
dam noun 4438 WR
momentum noun 4400 WR E18
ancestor noun 4178 TLQ N1.
monument noun 4106 TLQ U8
dawn noun 4044 WR N12a
intent noun 3992
click noun 3822
counsel noun 3441
indication noun 3401 WD
prompt adjective 3142
mount verb 3012
urge verb 2281 WR
cast verb 2052 WR
judge verb 1764
unique adjective 1735 WD E25
weather noun 1623
sight noun 1623
Word (organized by where they appear in the texts) Part  of speech Ranking Text Necessary (N)/useful (U) for question number
monument noun 4106 TLQ U8
ancestor noun 4178 TLQ N1
clack noun 32,467 TLQ 4 distractor
hush noun 15,394 TLQ N4
hush verb 14,295
haze noun 8,307 TLQ 4 distractor
vine noun 5746 TLQ U6
murky adjective 9265 TLQ U6
weather verb 9568 TLQ U8 distractor

E9

weather noun 1623
mossy adjective 19,480 TLQ U8 distractor

E9

moss noun 10142
inscription noun 9164 TLQ U8
slab noun 7585 TLQ U8
dawn noun 4044 WR N12a
cast verb 2052 WR
jockey noun 9723 WR
dam noun 4438 WR
startle verb 5517 WR
nocturnal adjective 13,755 WR E14
promptly adverb 6762 WR
prompt adjective 3142
stubborn adjective 7680 WR question E15
defy verb 5648 WR
sedately adverb 38,421 WR
sedate verb 23,110
sedate adjective 20,481
plod verb 16,053 WR
arthritic adjective 20,107 WR
arthritis noun 7,498 WR
nasal adjective 7857 WR
squeal noun 17,103 WR
squeal verb 12036
burrow noun 14,900 WR
prance verb 22,360 WR
skittishly adverb <60,000 WR
intent adjective 7747 WR
intent noun 3992
enraged adjective 13,378 WR
squeal noun 17,103 WR
squeal verb 12036
mill verb 13,378 WR E16
bewilderment adverb 17,410 WR E16
spindly adjective 19025 WR
evade verb 9759 WR
momentum noun 4400 WR E18
urge verb 2281 WR
sprint verb 8742
sprint noun 12,187 WR
blur noun 9319 WR
blur verb 6451
dismount verb 15,601 WR
mount verb 3012
sight verb 9730 WD
sight noun 1623
haven noun 5770 WD
slaughter noun 8494 WD
unique adjective 1735 WD E25
prey verb 15,771 WD
prey noun 4533
folklore noun 11,722 WD
remains noun 5375 WD
drought noun 5413 WD
oasis noun 10,567 WD U29
parch adjective 46,169 WD E29
recede verb 7809 WD
rehabilitate verb 11,496 WD E31
indication noun 3401 WD
anatomy noun 8310 WD E32
misjudge verb 19140 WD
judge verb 1764
devastating adjective 4885 WD

By way of contrast I did the same with the sample text. In the first text there were no words I thought were hard enough to check. In the second there were 4: cover (15,363), pitiful (13,211), brittle (10,462) and emerald (12,749).  In the third and final passage there were 8:triumphantly (16,3,43), glade (20,257), unwieldy (16,922), sapling (16,313, foliage 7,465, lurch (9339), ecstasy (9629) and finally, ranking off the scale below 60,000 gambols.

Milling around in bewilderment: that reading comprehension.