Test to the Teach

making-good-progressWhen Daisy Christodoulou told us not to teach to the test, I assumed she was mainly concerned with teachers spending too much lesson time making sure children understood the intricacies of the mark scheme at the expense of the intricacies of the subject. Personally, I’ve never spent that much time on the intricacies of any mark scheme. I’ve been far too busy making sure children grasp the rudimentary basics of how tests work to have time spare for anything intricate.   For example, how important it is to actually read the question.  I spend whole lessons stressing ‘if the question says underline two words that mean the same as …., that means you underline TWO words. Not one word, not three words, not two phrases. TWO WORDS.    Or if the questions says ‘tick the best answer’ then,  and yes, I know this is tricky, the marker is looking to see if you can select the BEST answer from a selection which will have been deliberately chosen to include a couple that are half right. BUT NOT THE BEST. (I need to lie down in a darkened room just thinking about it).

But this is not Christodoulou’s primary concern.

Christodoulou’s primary concern is that the way we test warps how we teach. While she is well aware that the English education system’s mania for holding us accountable distorts past and present assessment systems into uselessness, her over-riding concern is one of teaching methodology.  She contrasts the direct teaching of generic skills (such as using inference for example) with a methodology that believes such skills are better taught indirectly through teaching a range of more basic constituent things first, and getting those solid.  This approach, she argues, creates the fertile soil in which  inferring (or problem solving or  communicating or critical thinking or whatever) can thrive. It is a sort of ‘look after the pennies and the pounds will look after themselves’ or (to vary the metaphor) a ‘rising tide raises all boats’ methodology. Let me try to explain…

I did not come easily to driving. Even steering – surely the easiest part of the business – came to me slowly, after much deliberate practice in ‘not hitting anything.’ If my instructor had been in the business of sharing learning objectives she would surely have told me that ‘today we are learning to not hit anything.’

Luckily for the other inhabitants of Hackney, she scaffolded my learning by only letting me behind the wheel once we were safely on the deserted network of roads down by the old peanut factory. The car was also dual control, so she pretty much covered the whole gears and clutch business whilst I concentrated hard on not hitting anything.  Occasionally she would lean across and yank the steering wheel too.  However, thanks to her formative feedback (screams, yanks, the occasional extempore prayer), I eventually mastered both gears and not-hitting-anything.  Only at that point did we actually go on any big roads or ‘ to play with the traffic’ as she put it.  My instructor did not believe that the best way to get me to improve my driving was by driving. Daisy Christodoulou would approve.

Actually there was this book we were meant to complete at the end of each lesson. Michelle (my instructor) mostly ignored this, but occasionally she would write something – maybe the British School of Motoring does book looks –  such as ‘improve clutch control,’ knowing full well the futility of this –  if I  actually knew how to control a clutch I bloody well would. She assessed that what I needed was lots and lots of safe practice of clutch control with nothing else to focus on. So most lessons (early on anyway) were spent well away from other traffic, trying to change gears without stalling, jumping or screeching, with in-the-moment verbal feedback guiding me. And slowly I got better. If Michelle had had to account for my progress towards passing my driving test, she would have been in trouble. Whole areas  of the curriculum such as overtaking, turning right at a junction and keeping the correct distance between vehicles were not even attempted until after many months of lessons had taken place. Since we did not do (until right near the very end) mock versions of the driving test, she was not able to show her managers a nice linear graph showing what percentage of the test I had, and had not yet mastered.  I would not have been ‘on track’. Did Michelle adapt my learning to fit in with these assessments?  Of course not!  She stuck with clutch control until I’d really got it and left ‘real driving’ to the future- even though this made it look like I was (literally) going nowhere, fast.  Instead Michelle just kept on making sure I mastered all the basics and gradually added in other elements as she thought I was ready for them.  In the end, with the exception of parallel parking, I could do everything just about well enough. I passed on my third occasion.

I hope this extended metaphor helps explain Christodoulou’s critique of teaching and assessment practices in England today. Christodoulou’s book ‘Making Good Progress?’ explores why it is that the assessment revolution failed to transform English education. After all, the approach was rooted in solid research and was embraced by both government and the profession. What could possibly go wrong?

One thing that went wrong, explains Christodoulou, is that instead of  teachers ‘using evidence of student learning to adapt…teaching…to meet student needs’[1], teachers adapted their teaching to meet the needs of their (summative) assessments. Instead of assessment for learning we got learning for assessment.

Obviously assessments don’t actually have needs themselves. But the consumers of assessment – and I use the word advisedly –  do.  There exist among us voracious and insatiable accountability monsters, who need feeding at regular intervals with copious bucketfuls of freshly churned data.  Imagine the British School of Motoring held pupil progress meetings with their instructors. Michelle might have felt vulnerable that her pupil was stuck at such an early stage and have looked at the driving curriculum and seen if there were some quick wins she could get ticked off before the next data drop.  Preferably anything that doesn’t require you to drive smoothly in a straight line…signalling for example.

But this wasn’t even the main thing that went wrong. Or rather, something was already wrong, that no amount of AfL could put right. We were trying to teach skills like inference directly, when, in fact, these, so Christodoulou argues, are best learnt more indirectly by learning other things first. Instead of learning to read books by reading books, one should start with  technical details like phonics. Instead of starting with maths problem solving, one should learn some basic number facts. Christodoulou describes how what is deliberately practised – the technical detail –  may look very different from the final skill in its full glory. Phonics practice isn’t the same as reading a book.  Learning dates off by heart is not the same as writing a history essay.  Yet the former is necessary, if not sufficient basis for the latter. To use my driving metaphor, practising an emergency stop on a deserted road at 10mph when you know it’s coming is very, very different from actually having to screech to a stop from 40mph on a rainy day in real life, when a child runs out across the road. Yet the former helped you negotiate the latter.

The driving test has two main parts; technical control of the vehicle and behaviour in traffic (a.k.a. playing with the traffic). It is abundantly clear that to play with the traffic safely, the learner must have mastered a certain amount of technical control of the vehicle first. Imagine Michelle had adopted the generic  driving skill approach and assumed  these technical matters could be picked up en route, in the course of generally driving about,  and assumed that I could negotiate left and right turns at the same time as maintaining control of the vehicle. When I repeatedly stall, because the concentration it take to both brake and steer distracts me from concentrating on changing gears to match this slower speed, Michelle tells me that I did not change down quickly enough, which I find incredibly frustrating because I know I’ve got a gears problems, and it is my gears problem I need help with. But what I don’t get with the generic skill approach is time to practice changing gears up and down as a discrete skill. That would be frowned on as being ‘decontextualised’. I might protest that I’d feel a lot safer doing a bit of decontextualized practice right now – but drill and practice  is frowned upon – isn’t real driving after all – and in the actual test I am going to have to change gears and steer and brake all at the same time (and not hit anything) so better get used to it now.

Christodoulou argues that the direct teaching of generic skills  leads to the kind of assessment practice that puts the cart, if not before the horse, then parallel with it. Under this approach, if you want the final fruit of a course of study to be an essay on the causes of the First World War, the route map to this end point will punctuated with ‘mini-me’ variations of this final goal; shorter versions of the essay perhaps. These shorter versions are then used by the teacher formatively, to give the learner feedback about the relative strengths and weaknesses of these preliminary attempts. All the learner then has to do, in theory, is marshal all this feedback together, address any shortcomings whilst retaining, and possibly augmenting, any strengths. However, this often leaves the learner none the wiser about precisely how to address their shortcomings.  Advice to ‘be more systematic’ is only useful if you understand what being systematic means in practice, and if you already know that, you probably would have done so in the first place.[2]

It is the assessment of progress through  interim assessments that strongly resemble the final exam that Christodoulou means by teaching to the test.  Not because students shouldn’t know what  format an exam is going to take and have a bit of practice on it towards the very end of  a course of study.  That’s not teaching to the test. Teaching to the test is working backwards from the final exam and then writing a curriculum punctuated by  slightly reduced versions of that exam – and then teaching each set of lessons with the next test in mind.   The teaching is shaped by the approaching test.  This is learning for assessment.  By contrast Christodoulou argues that we should just concentrate on teaching the  curriculum and that there may be a whole range of other activities to assess how this learning is going that may look nothing like the final learning outcome. These, she contends, are much better suited to helping the learner actually improve their performance. For example, the teacher might teach the students what the key events were in the build up to the first World War, and then, by way of assessment, ask students to put these in correct chronological order on a time line.  Feedback from this sort of assessment is very clear –if events are in the wrong order, the student needs to learn them in the correct order.  The teacher teaches some  small component that will form part of the final whole, and tests that discrete part. Testing to the teach, in other words, as opposed to teaching to the test.

There are obvious similarities with musicians learning scales and sports players doing specific drills – getting the fine details off pat before trying to orchestrate everything together.  David Beckham apparently used to practice free kicks from all sorts of positions outside the penalty area, until he was able to hit the top corner of the goal with his eyes shut.  This meant that in the fury and flurry of a real, live game, he was able to hit the target with satisfying frequency.  In the same way, Christodoulou advocates spending more time teaching and assessing progress in acquiring decontextualized technical skills and less time on the contextualised ‘doing everything at once’, ‘playing with the traffic’ kind of tasks that closely resemble the final exam.  Only when we do this, she argues, will assessment for learning be able to bear fruit. When the learning steps are small enough and comprehensible enough for the pupil to act on them, then and only then will afl be a lever for accelerating pupil progress.

Putting my primary practitioner hat on, applying this approach in some areas (for example reading) chimes with what we already do,  but in others (I’m thinking writing here) the approach seems verging on the heretical.  Maths deserves a whole blog to itself, so I’m going to leave that for now – whilst agreeing whole-heartedly that thorough knowledge of times tables and number bonds  (not just to ten but within ten and within  twenty   – so including  3+5 and 8+5 for example) are  absolutely  crucial. Indeed I’d go so far as to say number bonds are even more important than times table knowledge, but harder to learn and rarely properly tested. hit-the-button I’ve mentioned hit the button in a previous blog. We have now created a simple spreadsheet that logs each child’s score from year 2 to year 6  in the various categories for number bonds. Children start with make 10 and stay on this until they score 25 or more (which means 25 correct in 1 minute which I reckon equates to automatic recall.  Then then proceed through the categories in turn – with missing numbers and make 100 with lower target scores of 15.  Finally they skip the two decimals categories and go to the times table section – which has division facts as well as multiplication facts. Yes!  When they’ve got those off pat, then they can return to do the decimals and the other categories. We’ve shared this, and the spreadsheet –  with parents and some children are practising at home each night. With each game only taking one minute, it’s not hard to insist that your child plays say three rounds of this first, before relaxing.  In class, the teachers test a group each day in class, using their set of 6 ipads.  However since kindle fire’s were on sale for £34.99 recently, we’ve just bought 10 of them (the same as the cost of 1 i pad). We’ll use them for lots of other things too, of course – anything where all you really need is access to an internet browser.

When we talk about mastery, people often talk about it like it’s this elusive higher plan that the clever kids might just attain in a state of mathematical or linguistic nirvana when really what it means is that every single child in your class – unless they have some really serious learning difficulty – has automatic recall of these basic number facts and (then later) their times tables.  And can use full stops and capital letters correctly the first time they write something. And can spell every word on the year 3 and 4 word list (and year 1 & 2 as well of course).  And read fluently – at least 140 words a minute, by the time they leave year 6. And have books they love to read – having read at least a million words for pleasure in the last year (We use accelerated reader to measure this – about half of year 6 are word millionaires already this year and a quarter have read over 2 million words.) How about primary schools holding themselves accountable to their secondary schools for delivering cohorts of children who have mastered all of these (with allowances for children who have not been long at the school or who have special needs)  a bit like John Lewis is ‘Never Knowingly Undersold’, we  should aim (among other things) to ensure at the very least, all our children who possibly could, have got these basics securely under their belt.

(My teacher husband and I now pause to have an argument about what should make it to the final list.   Shouldn’t something about place value be included? Why just facts?  Shouldn’t there be something about being able to use number bonds to do something?  I’m talking about a minimum guarantee here – not specifying everything that should be in the primary curriculum. He obviously needs to read the book himself.)

Reading

My extended use of the metaphor of learning to drive to explain Christodoulou’s approach has one very obvious flaw. We usually teach classes of 30 children whereas driving lessons are normally conducted 1:1. It is all very well advocating spending as much time on the basics as is necessary before proceeding onto having to orchestrate several different skills all at the same time, but imagine the frustration the more able driver would have felt stuck in a class with me and my poor clutch control.  They would want to be out there on the open roads, driving, not stuck behind me and my kangaroo petrol.  Children arrive at our schools at various starting points. Some children pick up the sound-grapheme correspondences almost overnight; for others it takes years. I lent our phonics cards to a colleague to show here three-year-old over the weekend; by Monday he knew them all. Whereas another pupil, now in year 5, scored under 10 in both his ks1 phonic checks.  I tried him again on it recently and he has finally passed.  He is now just finishing turquoise books. In other words, he has just graduated from year 1 level reading, 4 years later.  This despite daily 1:1 practice with a very skilled adult, reading from a decodable series he adores (Project X Code), as well as recently starting on the most decontextualized reading programme ever (Toe by Toe – which again he loves) and playing SWAP. He is making steady progress – which fills him with pride – but even if his secondary school carries on with the programme[3], at this rate he won’t really be a fluent reader until year 10. I keep on hoping a snowball effect will occur and the rate of progress will dramatically increase.

Outliers aside, there is a range of ability (or prior attainment if you prefer) in every class and for something as technical as phonics, this is most easily catered for by having children in small groups, depending on their present level. We use ReadWriteInc in  the early years and ks1.  Children are assessed individually by the reading leader  for their technical ability to decode, segment and blend every half term and groups adjusted accordingly.  So that part of our reading instruction is pretty Christodoulou-compliant, as I would have thought it is in most infant classes.  But what about the juniors, or late year 2 – once the technical side is pretty sorted and teachers turn to teaching reading comprehension.  Surely, if ever  a test was created solely  for the purposes of being able to measure something, it was the reading comprehension test, with the whole of ks2 reading curriculum one massive, time wasting exercise in teaching to the test?

I am well aware of the research critiquing the idea that there are some generic comprehension skills that can be taught in a way that can be learnt from specific texts and then applied across many texts, as Daniel Willingham explores here.  Christodoulou quotes Willingham several times in her book and her critique of generic skills is obvioulsy influenced by his work. As Willingham explains, when we teach reading comprehension strategies we are  actually teaching vocabulary, noticing understanding, and connecting ideas. (My emphasis).  In order to connect ideas (which is what inference is), the reader needs to know enough about those ideas, to work out what hasn’t been said as well as what has been. Without specific knowledge, all the generic strategies in the world won’t help. As Willingham explains

Inferences matter because writers omit a good deal of what they mean. For example, take a simple sentence pair like this: “I can’t convince my boys that their beds aren’t trampolines. The building manager is pressuring us to move to the ground floor.” To understand this brief text the reader must infer that the jumping would be noisy for the downstairs neighbors, that the neighbors have complained about it, that the building manager is motivated to satisfy the neighbors, and that no one would hear the noise were the family living on the ground floor. So linking the first and second sentence is essential to meaning, but the writer has omitted the connective tissue on the assumption that the reader has the relevant knowledge about bed‐jumping and building managers. Absent that knowledge the reader might puzzle out the connection, but if that happens it will take time and mental effort.’

So what the non-comprehending reader needs is  very specific  knowledge (about what it’s like to live in a flat), not some generic skill.  It could be argued then that schools therefore should spend more time teaching specific knowledge and less time elusive and non existant generic reading skills. However, Willingham concedes that the research shows that even so, teaching reading comprehension strategies does work. How can this be, he wonders? He likens the teaching of these skills as similar to giving someone vague instructions for assembling Ikea flat pack furniture.

 ‘Put stuff together. Every so often, stop, look at it, and evaluate how it is going. It may also help to think back on other pieces of furniture you’ve built before.

This is exactly the process we go through during shared reading. On top of our daily phonics lessons we have two short lessons a week of shared reading where the class teacher models being a reader using the eric approach. In other words, we have daily technical lessons,  and twice a week  we also have a bit of ‘playing with the traffic’ or more accurately, listening to the teacher playing with the traffic and talking about what they are doing as they do it.  In our shared reading lessons, by thinking out loud about texts, the teacher makes it very explicit that texts are meant to be understood and enjoyed and not just for barking at and that  therefore we should check as we go along that we are understanding what we are reading (or looking at). If  we don’t understand something, we should stop and ask ourselves questions.   It is where the teacher articulates that missing ‘connective tissue’,  or ‘previous experience of building furniture’  to use Willingham’s Ikea metaphor, sharing new vocabulary and knowledge of the how the world works, knowledge that many of our inner city children do not have.  (Although actually for this specific instance many of them would know about noisy neighbours, bouncing on beds and the perils of so doing whilst living in flats.)

eric

For example, this picture (used in ‘eric’ link above) gives the the teacher the opportunity to share their knowledge that that sometimes the sea can get rough and that this means the waves get bigger and the wind blows strongly. Sometimes it might blow so hard that it could even blow your hat right off your head. As the waves rise and fall, the ship moves up and down and tilts first one way, and then the other. (Pictures are sometimes used for this  rather than texts so working memory is relieved from the burden of decoding).

When teaching children knowledge is extolled as the next panacea, it’s not that I don’t agree, it’s just that I reckon people really underestimate quite how basic some of the knowledge we need to impart for our younger children. I know of primary schools proudly adopting a ‘knowledge curriculum’ and teaching  two hours of history a week, with two years given over to learning about the Ancient Greeks.  I just don’t see how this will help children understand texts about noisy neighbours, or about what the sea is like (although you could do that in the course of learning about Ancient Greece if you realised children didn’t know), or, for that matter, what it is like to mill around in bewilderment.  The only kind of assessment that will help here is the teacher’s ‘ear to the ground’ minute by minute assessment – realising that -oh, some of them haven’t ever seen the sea, or been on a boat. They don’t know about waves or how windy it can be or how you rock up and down.   This is the kind of knowledge that primary teachers in disadvantaged areas need to talk about all the time.  And why we need to go on lots of trips too. But it is not something a test will pick up nor something you can measure progress gains in.  The only way to increase vocabulary is one specific word at a time. It is also why we should never worry about whether something is ‘relevant’ to the children or not. If it is too relevant, then they already know about it – the more irrelevant the better.

I don’t  entirely agree with the argument that since we can’t teach generic reading skills we should instead teach lots more geography and history since this will give  children the necessary knowledge they need to understand what they read.   We need to read and talk, talk talk about stories and their settings -not just what a mountain is but how it feels to climb a mountain or live on a mountain, how that affects your daily life, how you interact with your neighbours.  We need to read more non fiction aloud, starting in the early years.  We need to talk about emotions and body language and what the author is telling us by showing us.  A quick google will show up writers body language ‘cheat sheets’. We need to reverse engineer these and explain that if the author has their character fidgetting with darting eyes, that probably means they are feeling nervous. Some drama probably wouldn’t go amiss either.  Willingham’s trio of  teaching vocabulary, noticing understanding, and connecting ideas is a really helpful way of primary teachers thinking about what they are doing when they teach reading comprehension. What we need to assess and feedback to children is how willing they to admit they don’t understand something, to ask what a word means, to realise they must be missing some connection.  None of this is straightforwardly testable. That doesn’t mean it isn’t important.

Writing

Whereas most primary schools, to a greater or lesser degree, teach reading by at first teaching phonics, the teaching of writing is much more likely to be taught though students writing than it is through teaching a series of sub skills.  It is the idea that we ensure technical prowess before  we spend too much time on creative writing that most challenges the way we currently do things.

Of course we teach children to punctuate their sentences with capital letters and full stops right at the start of their writing development. However, patently, this instruction has limited effectiveness for many children.  They might remember when they are at the initial stages and when they only write one sentence anyway – so not so hard to remember the final full stop in that case. Where it all goes wrong is once they start writing more than one sentence, further complicated when they start writing sentences with more than one clause. I’ve often thought we underestimate how conceptually difficult it is to understand what a sentence actually is.  Learning to use speech punctuation is far easier than learning what is, and what is not, a sentence. Many times we send children back to put in their full stops, actually, they don’t really get where fulls tops really go.  On my third session doing 1:1 tuition with a year 5 boy, he finally plucked up the courage to tell me that he know he should but he just didn’t get how you knew where sentences ended.  So I abandoned what I’d planned and instead we  learnt about sentences. I told him that sentences had a person or a thing doing something, and then after those two crucial bits we might get some extra information about where or why or with whom or whatever that  belongs with the person/thing, so needs to be in the same sentence.   We analysed various sentences, underlining the person/thing in one colour, the doing something word in another colour and finally the extra information (which could be adjectives, adverbs, prepositions, the object of the sentence – the predicate minus the verb basically) in another. This was some time ago before the renaissance of grammar teaching, so it never occurred to me to use the terms ‘subject’ ‘noun’ ‘verb’ etc but I would do now. It was all done of the hoof, but after three lessons he had got it, and even better, could apply it in his own writing.

What Christodoulou is advocating is that instead of waiting until things have got so bad they need 1:1 tuition to put it right, we systematically teach sentence punctuation (and other  common problems such as verb subject agreement), giving greater priority to this than to creative writing. In other words, stop playing with the traffic before you’ve mastered sufficient technical skills to do so properly.  This goes against normal primary practice, but I can see the sense in this. If ‘practice makes permanent’ as cognitive psychology tells us (see chapter 7 of What Every teacher Needs to Know About Psychology by Didau and Rose for more on this), then the last thing we want is for children to practice again and again doing something incorrectly. But this is precisely what our current practice does. Because most of the writing we ask children to do is creative writing, children who can’t punctuate their sentences get daily practice in doing it wrong. The same goes for letter formation and spelling of high frequency common exception words. Maybe instead we need to spend far more time in the infants and into year 3 if necessary on doing drills where we punctuate text without the added burden of composing as we go. Maybe this way, working memories would not become so overburdened with thinking about what to say that the necessary technicalities went out the window. After that, we could rewrite this correctly punctuated text in correctly formed handwriting.  Some children have genuine handwriting or spelling problems and I wouldn’t want to condemn dyslexic and dyspraxic children to permanent technical practice. However if we did more technical practice in the infants  – which would mean less time for writing composition – we might spot who had a genuine problem earlier and then put in place specific programmes to help them and/or aids to get round the problem another way. After all,  not all drivers use manual transmission, some drive automatics.

Christodoulou mentions her experience of using the ‘Expressive Writing’ direct instruction programme, which I duly ordered. I have to say it evoked a visceral dislike in me; nasty cheap paper, crude line drawings,  totally decontextualised, it’s everything my primary soul eschews (and  it’s expensive to boot). However, the basic methodology is sound enough – and Christodoulou only mentions it because it is the ones she is familiar with. It is not like she’s giving it her imprimatur or anything.  I’m loathed to give my teachers more work, but  I don’t think it would be too hard to invent some exercises that are grounded in the context of something else children are learning; some sentences about Florence Nightingale or the Fire of London for example, or a punctuation-free excerpt from a well-loved story.  Even if we only did a bit more of this and a bit less of writing compositions where we expect children to orchestrate many skills all at once, we should soon see gains also in children’s creative writing. Certainly, we should insist of mastery in these core writing skills by year 3, and  where children still can’t punctuate a sentence, be totally ruthless in focusing on that until the problem is solved. And I don’t just mean that they can edit in their full stops after the fact, I mean they put them (or almost all of them in ) as they write. it needs to become an automatic process. Once it is automatic is it easy.   Otherwise we are not doing them any favours in the long term as we are just making their error more and more permanent and harder and harder to undo.

Certainly pupil progress meetings would be different. Instead of discussing percentages and averages,  the conversation would be very firmly about the teacher sharing the gaps in knowledge they had detected, the plans they had put in place to bridge those gaps, and progress to date in so doing, maybe courtesy of the ‘hit the button’ spreadsheet, some spelling tests, end of unit maths tests, records of increasing reading fluency. Already last July our end of year reports for parents shared with them which number facts, times tables and spellings (from the year word lists) their child did not yet know…with the strong suggestion that the child work on these over the summer!   We are introducing ‘check it’ mini assessments so that we can check that we we taught three weeks ago is still retained. It’s easy, we just test to the teach.

[1] Christodoulou quoting D. Wiliams, p 19 Making Good Progress?

[2] Christodoulou quoting D. Wiliams, p 20 Making Good Progress?

[3] I say this because our local secondary school told me they didn’t believe in withdrawing children from class for interventions. Not even reading interventions. Surely he could miss MFL and learn to read in English first? As a minimum.  Why not English lessons? I know he is ‘entitled’ to learn about Macbeth but at the expense of learning to read? Is Macbeth really that important? Maybe he will go to a different secondary school or they’ll change their policy.

Test to the Teach

Milling around in bewilderment: that reading comprehension.

You can see the reading test for yourself from this link

The day started well; dawn casting spun-gold threads across a rosy sky.  The long wait was over; sats week was finally here.  And it looked like summer had arrived. Year 6 tripped in to classrooms  while head teachers fumbled skittishly with secret keys in hidden cupboards.  Eventually teachers across the nation ripped open plastic packets.Perhaps at first their fears were calmed, for the text – or what you can glean about it from reading snippets here and there as you patrol the rows – didn’t seem too bad. In previous weeks children had struggled with excerpts from the Lady of Shalott, Moonfleet, Shakespeare.  The language here looked far more contemporary.

But no. Upon completion children declared the test was hard – really hard. Many hadn’t finished – including  children who usually tore through tests like a…white giraffe? What is more, the texts didn’t seem to be in any kind of order. We had drilled into them, as per the test specification guide, that the texts would increase in difficulty throughout the paper (section 6.2) Yet the middle text was almost universally found to be the hardest.  Some declared the final text the easiest. What was going on?

Tests safely dispatched, I decided to take a proper look. It didn’t take long for it to be apparent that the  texts contained demanding vocabulary, and some tortuous sentence structure. The difference with the sample test material was stark. Twitter was alive with tales on sobbing kids, and angry teachers. Someone said they had analysed the first paragraph of the first text and it came out with a reading age of 15. Debate followed; was this really true or just a rumour? Were readability tests reliable? I tweeted that it was a test of how middle class and literary one’s parents were, having identified 45 words I reckoned might challenge our inner city children.  After all, as a colleague remarked, ‘my three-year-old knows more words than some children here’. Other people drew groans by mentioning how irrelevant the texts were to the kind of lives their children lived. I seemed to be implicated in this criticism…although it’s difficult to tell who’s criticising who sometimes on Twitter. Still, I was put out. I don’t care if texts are ‘relevant’, I retorted. I cared that the vocabulary needed to answer questions  favoured a ‘posh demographic.  Apparently, this was patronising. I saw red at this point! It’s not that poorer children can’t acquire a rich vocabulary but that since it is well known that a rich vocabulary is linked to parental income and the domain ‘rich vocabulary ‘ is huge (and undefined), it is not fair or useful to use tests that rely on good vocabulary for accountability.  And then I put a link to this previous post of mine, where I’ve explained this in more depth. If accountability tests over-rely on assessing vocabulary as a proxy for assessing reading, this hands a free pass to school choc full of children like my colleague’s three-year-old, since such children arrive already stuffed to the gills with eloquence and articulacy. Whereas the poorer the intake the greater the uphill struggle to enable the acquisition of the kind of  cultural capital richer children imbibe with their mother’s milk.

Flawed as the previous reading tests were, they did not stack the cards against  schools serving language-poor populations. The trouble with using vocabulary as a measure is that it that each individual word is so specific. Usually what we teach is generalisable from one context to another. Learning words however has to be done on a case by case basis. I recently taught year 6 somnolent, distraught and clandestine, among many others. I love teaching children new words, and they love acquiring them.  But unless there is some sort of finite list against which we are to be judged, I’d rather not have our school judged by a test that is hard to pass without an expansive and sophisticated vocabulary. With the maths and SPAG tests, we know exactly what is going to be tested. The domain is finite. We worry about how to teach it so it is understood and remembered, but we do not worry that some arcane bit of maths will worm its way into the test.  Nautical miles, for example.  Not so with reading. Any word within the English language is fair game – including several that don’t regularly appear in the vocabulary of the average adult. There may be very good reasons for the government to want to ascertain the breadth of vocabulary acquisition across the nation. In which case, they could instigate a vocabulary test – maybe something along the lines of this.  But that shouldn’t be confused with the ability to read. To return to our earlier example, my colleagues three-year-old may have an impressive vocabulary but she can’t actually read much  at all yet. Whereas our 11-year-olds may not know as many words but are happily enjoying reading  the Morris Glietzman ‘Once’ series.

It is becoming accepted that  reading is not just the orchestration of a set of skills, but requires knowledge of the context for the reader to make sense of the bigger picture.  But that’s not what happened here.  It’s not the case that children found the texts difficult because they lacked knowledge of the context. The context of the first text was two children exploring outdoors. True only 50% of our present year 6 knew what a monument was at the outset – a bit tricky since this was pretty central to the test – but by the end of the story they sort of worked it out for themselves. The second text featured a  young girl disobeying her grandmother and taking risks. And a giraffe. Well I reckon this is pretty familiar territory (grandmothers and risks, I mean) and while we do not meet giraffes everyday in Bethnal Green, we know what they are.  The third and final text told us all about dodos and how they may have been unfairly maligned by Victorian scientists. So that was a bit more remote from every day experience but no so terribly outlandish as to render the text impenetrable. The third text is meant to be harder. The children are meant to have studied evolution and extinction by then in science anyway.   So it wasn’t that the Sitz im Leben was so abstruse as to render comprehension impossible. The problem was the words used within the texts and the high number of questions which were dependent upon knowing what those words meant. The  rather convoluted sentence structure in the  second text didn’t help either – but if the words had been more familiar, children might have stood more of a fighting chance.

According to the test specification, questions can be difficult in one of five different ways. These five ways are based on research commissioned by the PISA guys. It’s an interesting  and informative read – so I’m not arguing with the methodology per se.  I don’t know nearly enough to even attempt that. Amateur though I am, I do argue with the relative proportions allocated to each of the five strategies in this test.

With three of these, I have no quarrel. Firstly,  ( and my ordering is different from that in the document)  questions can be made easier or harder in terms of accessibility; how easy is it to find the information? Is the student signposted to it (e.g. see the first paragraph on page 2).   Or is the question difficulty raised by not signposting and possibly by having distractor items to lure students down dead ends?  I think we have little to complain about here. e.g. question 30 has clear signposting…’Look at the paragraph beginning:  Then, in 2005…’ whereas  in question 33  the relevant information is much harder to find – it’s a ‘match the summary of the paragraph to the order in which they occur’ question.

Secondly, questions may vary in terms of task-specific complexity. How much work does the student have to do to answer the question?  Is it a simple information retrieval task or does the pupil have to use inference?

For example, question 7 is easy in this regards.’ Write down three things you are told about the oak tree.’   The text clearly says the oak tree was ‘ancient’.  I haven’t checked the mark scheme as it’s not yet published as I write, but I am assuming that’s enough to earn you 1 mark. Whereas question 3 is a bit harder. ‘How can you tell that Maria was very keen to get to the island?  Students need to infer  this from the fact that she she said  something ‘impatiently’.   There are far fewer of this kind of question under this new regime – but we were expecting that and the sample paper demonstrated that. Again – no complaints.  Indeed the test specification does  share the relative weightings of different skills (in section 6.2.2, table 9), but the bands are so wide its all a bit meaningless. Inference questions can make up between 16% and 50% of all questions, for example.

Thirdly, the response strategy can be more or less demanding, a one word answer versus a three-marker explain your opinion question.

The two final ways to make questions more or less difficult are by either varying the extent of knowledge of vocabulary required by the question (strategy 5 in the specification document) or by varying  the complexity of the target information that is needed to answer the question.  (Strategy 2) The document goes on to explain that this means by varying

• the lexico-grammatical density of the stimulus

• the level of concreteness / abstractness of the target information

• the level of familiarity of the information needed to answer the question

and that …’There is a low level of semantic match between task wording and relevant information in the text.’

I’m not quite sure what the difference is between ‘lexico-grammatical density’  (strategy 2) and knowledge of vocabulary required by the question (strategy 5), but  the whole thrust of this piece is that texts were pretty dense lexico-grammatically and in terms of vocabulary needed to answer the questions. When compared with the sample test for example, the contrast is stark. Now I’m no expert in linguistics or test question methodology. I’m just a headteacher with an axe to grind, a weekend to waste and access to google.  But this has infuriated me enough to do a fair bit of reading around the subject.

On the Monday evening post test, twitter was alive with people quoting someone who apparently had said that the first paragraph of the first text had  a Flesch Kincaid reading ease equivalent to that of a 15 year old. I’d never heard of Flesch Kincaid – or any other of the readability tests – so I did some research and found out that indeed, the first paragraph of The Lost Queen was described as suitable for 8th-9th graders – or 13-15 year olds in the British system. But there was also criticism online that the readability tests  rated the same texts quite differently so weren’t a reliable indicator of much. (Someone put a link up to an article about this, which I foolishly forgot to bookmark and now can’t find – do share the link again if it was you or you know a good source.)*

Anyway, be that as it may, I decided to do some readability tests of various bits and pieces of the sats paper.  And this is what I discovered. (texts listed in order of alleged difficulty)

The Lost Queen first paragraph:  13-15 year olds

Wild Ride first paragraph:              13-15 year olds

Wild Ride ‘bewildered’ paragraph     18-22 year olds

Way of the Dodo first paragraph          13-15 year olds  (and lower score than The Lost Queen)

Way of the Dodo 2nd paragraph         13-15 year olds.

So there you have it, insofar as Flesch Kincaid has any reliability, the supposedly hardest text was in fact the easiest, the middle text was the hardest.

I did the same with the the sample paper. The first had a readability level of a 11-12 year-old and the second 13 – 15. I had lost the will to live by then so didn’t do the third text – but it is clearly much more demanding than the previous two  – as it should be.

I also used the automated reading index and while this gave slightly different age ranges, the relative difficulty was the same and all the texts were for children older than 11, the easiest being… the first part of the way of the dodo.

However,  it was also clear from my reading that readability tests are designed to help people writing, say pamphlets for the NHS, make the writing as transparent and easy as possible. In other words, they are intended to make reading simple so people who aren’t very good at it can understand stuff that may be very important. It struck me that maybe this wasn’t exactly what we should be aiming for in a reading assessment. After all, we do want some really challenging questions at some point. We just want them  at the end, where they are meant to be. We need readability tests because previous generations have not been taught well enough to be presented with demanding information. We want better for the children we now teach.

Which brought me to discover this site, which ranks words by their relative frequency in the English language.  If we are going to be held accountable for the sophistication of the vocabulary are children can comprehend, then surely there should be some bounds on that.  While the authority of this is contested, it seems to be generally held that the average adult knows about 20,000 words. You can test yours here.   How many words the average 11 year old does or should know I did not discover – so here are my ball park suggestions.

For the first text – the one that is meant to be easier – there should be a cap on words ranked occurring below 10,000. (I’m assuming here we understand that as words are used less frequently their ranking falls but the actual number rises: a ranking of 20,000 is lower than a ranking of 10,000. If this is not the correct convention for such matters, I apologise). Definitions should be given for low frequency words, especially if understanding them is critical to answering specific questions. In the same way in which Savannah was explained at the introduction to Wild Ride

Then in the second text words could be limited to 15,000, and in the third 20,000 – representing the average adult’s vocabulary. I have plucked these figures from the air. I would not go to the stake for them. But you get my meaning. We need to pin down the domain of ‘vocabulary’ if we are to be held accountable when it is tested.

For what it is worth, I asked our year 6 after the test to tell me which words they did not know. There are 30 children in the class. Words where half the class or more did not know the meaning included  from the first text: monument, haze, weathered (as a verb); from the second text: jockey, dam, promptly, sedately (zero children), counselled, arthritic, nasal, pranced, skittishly (zero children), milled, bewildered, spindly, momentum; from the third list haven, oasis ( they knew this was a brand of drink though), parched, receding, rehabilitate and anatomy. My Geordie partner tells me they would have known parched if they were northern because that’s Geordie for ‘I’m really thirsty.’  Here we can see again that the middle passage had the highest number of unknown words in my obviously unrepresentative sample. In fact, it was the first paragraph on page 8, which henceforth shall be known as the bewildering paragraph that seemed to have the highest lexico-grammatical density.  As the mud flats entrapped the Mauritian dodos, so did this paragraph ensnare our readers, slowing them down to the extent that they failed to finish the questions pertaining to the relatively easy  final text.

Maybe I’m wrong. Maybe when the statistics are finally in, there won’t be a starker-than-usual demarcation along class lines. I’d love to be wrong. Let’s hope I am.

And finally, what you’ve all been waiting for – what was the lowest ranking word?  Well yes of course, it was ‘skittishly‘;  so rare it doesn’t even appear in the data base of 60,000 words I was using. But suitable for 11 year olds, apparently.

In case you are interested, here’s the full rankings. Where the word might be more familiar as a different part of speech I have included a ranking for that word too, in italics. The words I chose to rank were just those my deputy and I thought children might find tricky.

Word

(organized by rank lowest to highest)

Part  of speech Ranking Text Necessary (N)/useful  (U) for question number
skittishly adverb <60,000 WR
parch adjective 46,169 WD E29
sedately adverb 38,421 WR
clack noun 32,467 TLQ 4 distractor
sedate verb 23,110
prance verb 22,360 WR
skittish adjective 21,298
sedate adjective 20,481
arthritic adjective 20,107 WR
mossy adjective 19,480 TLQ U8 distractor

E9

misjudge verb 19140 WD
spindly adjective 19025 WR
bewilderment adverb 17,410 WR E16
squeal noun 17,103 WR
sternly adverb 16,117 WR
plod verb 16,053 WR
prey verb 15,771 WD
dismount verb 15,601 WR
hush noun 15,394 TLQ N4
burrow noun 14,900 WR
hush verb 14,295
nocturnal adjective 13,755 WR E14
enraged adjective 13,378 WR
mill verb 13,378 WR E16
sprint noun 12,187 WR
squeal verb 12036
folklore noun 11,722 WD
rehabilitate verb 11,496 WD E31
oasis noun 10,567 WD U29
stern adjective 10,377
moss noun 10142
evade verb 9759 WR
sight verb 9730 WD
jockey noun 9723 WR
weather verb 9568 TLQ U8
blur noun 9319 WR
murky adjective 9265 TLQ U6
inscription noun 9164 TLQ U8
rein noun 8793 WR
sprint verb 8742
slaughter noun 8494 WD
anatomy noun 8310 WD E32
haze noun 8,307 TLQ 4 distractor
counsel verb 7905 WR
nasal adjective 7857 WR
recede verb 7809 WD
intent adjective 7747 WR
stubborn adjective 7680 WR question E15
slab noun 7585 TLQ U8
arthritis noun 7,498 WR
promptly adverb 6762 WR
blur verb 6451
haven noun 5770 WD
vine noun 5746 TLQ U6
defy verb 5648 WR E15
startle verb 5517 WR
drought noun 5413 WD
remains noun 5375 WD
devastating adjective 4885 WD
rehabilitation noun 4842
prey noun 4533
dam noun 4438 WR
momentum noun 4400 WR E18
ancestor noun 4178 TLQ N1.
monument noun 4106 TLQ U8
dawn noun 4044 WR N12a
intent noun 3992
click noun 3822
counsel noun 3441
indication noun 3401 WD
prompt adjective 3142
mount verb 3012
urge verb 2281 WR
cast verb 2052 WR
judge verb 1764
unique adjective 1735 WD E25
weather noun 1623
sight noun 1623
Word (organized by where they appear in the texts) Part  of speech Ranking Text Necessary (N)/useful (U) for question number
monument noun 4106 TLQ U8
ancestor noun 4178 TLQ N1
clack noun 32,467 TLQ 4 distractor
hush noun 15,394 TLQ N4
hush verb 14,295
haze noun 8,307 TLQ 4 distractor
vine noun 5746 TLQ U6
murky adjective 9265 TLQ U6
weather verb 9568 TLQ U8 distractor

E9

weather noun 1623
mossy adjective 19,480 TLQ U8 distractor

E9

moss noun 10142
inscription noun 9164 TLQ U8
slab noun 7585 TLQ U8
dawn noun 4044 WR N12a
cast verb 2052 WR
jockey noun 9723 WR
dam noun 4438 WR
startle verb 5517 WR
nocturnal adjective 13,755 WR E14
promptly adverb 6762 WR
prompt adjective 3142
stubborn adjective 7680 WR question E15
defy verb 5648 WR
sedately adverb 38,421 WR
sedate verb 23,110
sedate adjective 20,481
plod verb 16,053 WR
arthritic adjective 20,107 WR
arthritis noun 7,498 WR
nasal adjective 7857 WR
squeal noun 17,103 WR
squeal verb 12036
burrow noun 14,900 WR
prance verb 22,360 WR
skittishly adverb <60,000 WR
intent adjective 7747 WR
intent noun 3992
enraged adjective 13,378 WR
squeal noun 17,103 WR
squeal verb 12036
mill verb 13,378 WR E16
bewilderment adverb 17,410 WR E16
spindly adjective 19025 WR
evade verb 9759 WR
momentum noun 4400 WR E18
urge verb 2281 WR
sprint verb 8742
sprint noun 12,187 WR
blur noun 9319 WR
blur verb 6451
dismount verb 15,601 WR
mount verb 3012
sight verb 9730 WD
sight noun 1623
haven noun 5770 WD
slaughter noun 8494 WD
unique adjective 1735 WD E25
prey verb 15,771 WD
prey noun 4533
folklore noun 11,722 WD
remains noun 5375 WD
drought noun 5413 WD
oasis noun 10,567 WD U29
parch adjective 46,169 WD E29
recede verb 7809 WD
rehabilitate verb 11,496 WD E31
indication noun 3401 WD
anatomy noun 8310 WD E32
misjudge verb 19140 WD
judge verb 1764
devastating adjective 4885 WD

By way of contrast I did the same with the sample text. In the first text there were no words I thought were hard enough to check. In the second there were 4: cover (15,363), pitiful (13,211), brittle (10,462) and emerald (12,749).  In the third and final passage there were 8:triumphantly (16,3,43), glade (20,257), unwieldy (16,922), sapling (16,313, foliage 7,465, lurch (9339), ecstasy (9629) and finally, ranking off the scale below 60,000 gambols.

Milling around in bewilderment: that reading comprehension.