You can see the reading test for yourself from this link
The day started well; dawn casting spun-gold threads across a rosy sky. The long wait was over; sats week was finally here. And it looked like summer had arrived. Year 6 tripped in to classrooms while head teachers fumbled skittishly with secret keys in hidden cupboards. Eventually teachers across the nation ripped open plastic packets.Perhaps at first their fears were calmed, for the text – or what you can glean about it from reading snippets here and there as you patrol the rows – didn’t seem too bad. In previous weeks children had struggled with excerpts from the Lady of Shalott, Moonfleet, Shakespeare. The language here looked far more contemporary.
But no. Upon completion children declared the test was hard – really hard. Many hadn’t finished – including children who usually tore through tests like a…white giraffe? What is more, the texts didn’t seem to be in any kind of order. We had drilled into them, as per the test specification guide, that the texts would increase in difficulty throughout the paper (section 6.2) Yet the middle text was almost universally found to be the hardest. Some declared the final text the easiest. What was going on?
Tests safely dispatched, I decided to take a proper look. It didn’t take long for it to be apparent that the texts contained demanding vocabulary, and some tortuous sentence structure. The difference with the sample test material was stark. Twitter was alive with tales on sobbing kids, and angry teachers. Someone said they had analysed the first paragraph of the first text and it came out with a reading age of 15. Debate followed; was this really true or just a rumour? Were readability tests reliable? I tweeted that it was a test of how middle class and literary one’s parents were, having identified 45 words I reckoned might challenge our inner city children. After all, as a colleague remarked, ‘my three-year-old knows more words than some children here’. Other people drew groans by mentioning how irrelevant the texts were to the kind of lives their children lived. I seemed to be implicated in this criticism…although it’s difficult to tell who’s criticising who sometimes on Twitter. Still, I was put out. I don’t care if texts are ‘relevant’, I retorted. I cared that the vocabulary needed to answer questions favoured a ‘posh demographic. Apparently, this was patronising. I saw red at this point! It’s not that poorer children can’t acquire a rich vocabulary but that since it is well known that a rich vocabulary is linked to parental income and the domain ‘rich vocabulary ‘ is huge (and undefined), it is not fair or useful to use tests that rely on good vocabulary for accountability. And then I put a link to this previous post of mine, where I’ve explained this in more depth. If accountability tests over-rely on assessing vocabulary as a proxy for assessing reading, this hands a free pass to school choc full of children like my colleague’s three-year-old, since such children arrive already stuffed to the gills with eloquence and articulacy. Whereas the poorer the intake the greater the uphill struggle to enable the acquisition of the kind of cultural capital richer children imbibe with their mother’s milk.
Flawed as the previous reading tests were, they did not stack the cards against schools serving language-poor populations. The trouble with using vocabulary as a measure is that it that each individual word is so specific. Usually what we teach is generalisable from one context to another. Learning words however has to be done on a case by case basis. I recently taught year 6 somnolent, distraught and clandestine, among many others. I love teaching children new words, and they love acquiring them. But unless there is some sort of finite list against which we are to be judged, I’d rather not have our school judged by a test that is hard to pass without an expansive and sophisticated vocabulary. With the maths and SPAG tests, we know exactly what is going to be tested. The domain is finite. We worry about how to teach it so it is understood and remembered, but we do not worry that some arcane bit of maths will worm its way into the test. Nautical miles, for example. Not so with reading. Any word within the English language is fair game – including several that don’t regularly appear in the vocabulary of the average adult. There may be very good reasons for the government to want to ascertain the breadth of vocabulary acquisition across the nation. In which case, they could instigate a vocabulary test – maybe something along the lines of this. But that shouldn’t be confused with the ability to read. To return to our earlier example, my colleagues three-year-old may have an impressive vocabulary but she can’t actually read much at all yet. Whereas our 11-year-olds may not know as many words but are happily enjoying reading the Morris Glietzman ‘Once’ series.
It is becoming accepted that reading is not just the orchestration of a set of skills, but requires knowledge of the context for the reader to make sense of the bigger picture. But that’s not what happened here. It’s not the case that children found the texts difficult because they lacked knowledge of the context. The context of the first text was two children exploring outdoors. True only 50% of our present year 6 knew what a monument was at the outset – a bit tricky since this was pretty central to the test – but by the end of the story they sort of worked it out for themselves. The second text featured a young girl disobeying her grandmother and taking risks. And a giraffe. Well I reckon this is pretty familiar territory (grandmothers and risks, I mean) and while we do not meet giraffes everyday in Bethnal Green, we know what they are. The third and final text told us all about dodos and how they may have been unfairly maligned by Victorian scientists. So that was a bit more remote from every day experience but no so terribly outlandish as to render the text impenetrable. The third text is meant to be harder. The children are meant to have studied evolution and extinction by then in science anyway. So it wasn’t that the Sitz im Leben was so abstruse as to render comprehension impossible. The problem was the words used within the texts and the high number of questions which were dependent upon knowing what those words meant. The rather convoluted sentence structure in the second text didn’t help either – but if the words had been more familiar, children might have stood more of a fighting chance.
According to the test specification, questions can be difficult in one of five different ways. These five ways are based on research commissioned by the PISA guys. It’s an interesting and informative read – so I’m not arguing with the methodology per se. I don’t know nearly enough to even attempt that. Amateur though I am, I do argue with the relative proportions allocated to each of the five strategies in this test.
With three of these, I have no quarrel. Firstly, ( and my ordering is different from that in the document) questions can be made easier or harder in terms of accessibility; how easy is it to find the information? Is the student signposted to it (e.g. see the first paragraph on page 2). Or is the question difficulty raised by not signposting and possibly by having distractor items to lure students down dead ends? I think we have little to complain about here. e.g. question 30 has clear signposting…’Look at the paragraph beginning: Then, in 2005…’ whereas in question 33 the relevant information is much harder to find – it’s a ‘match the summary of the paragraph to the order in which they occur’ question.
Secondly, questions may vary in terms of task-specific complexity. How much work does the student have to do to answer the question? Is it a simple information retrieval task or does the pupil have to use inference?
For example, question 7 is easy in this regards.’ Write down three things you are told about the oak tree.’ The text clearly says the oak tree was ‘ancient’. I haven’t checked the mark scheme as it’s not yet published as I write, but I am assuming that’s enough to earn you 1 mark. Whereas question 3 is a bit harder. ‘How can you tell that Maria was very keen to get to the island? Students need to infer this from the fact that she she said something ‘impatiently’. There are far fewer of this kind of question under this new regime – but we were expecting that and the sample paper demonstrated that. Again – no complaints. Indeed the test specification does share the relative weightings of different skills (in section 6.2.2, table 9), but the bands are so wide its all a bit meaningless. Inference questions can make up between 16% and 50% of all questions, for example.
Thirdly, the response strategy can be more or less demanding, a one word answer versus a three-marker explain your opinion question.
The two final ways to make questions more or less difficult are by either varying the extent of knowledge of vocabulary required by the question (strategy 5 in the specification document) or by varying the complexity of the target information that is needed to answer the question. (Strategy 2) The document goes on to explain that this means by varying
• the lexico-grammatical density of the stimulus
• the level of concreteness / abstractness of the target information
• the level of familiarity of the information needed to answer the question
and that …’There is a low level of semantic match between task wording and relevant information in the text.’
I’m not quite sure what the difference is between ‘lexico-grammatical density’ (strategy 2) and knowledge of vocabulary required by the question (strategy 5), but the whole thrust of this piece is that texts were pretty dense lexico-grammatically and in terms of vocabulary needed to answer the questions. When compared with the sample test for example, the contrast is stark. Now I’m no expert in linguistics or test question methodology. I’m just a headteacher with an axe to grind, a weekend to waste and access to google. But this has infuriated me enough to do a fair bit of reading around the subject.
On the Monday evening post test, twitter was alive with people quoting someone who apparently had said that the first paragraph of the first text had a Flesch Kincaid reading ease equivalent to that of a 15 year old. I’d never heard of Flesch Kincaid – or any other of the readability tests – so I did some research and found out that indeed, the first paragraph of The Lost Queen was described as suitable for 8th-9th graders – or 13-15 year olds in the British system. But there was also criticism online that the readability tests rated the same texts quite differently so weren’t a reliable indicator of much. (Someone put a link up to an article about this, which I foolishly forgot to bookmark and now can’t find – do share the link again if it was you or you know a good source.)*
Anyway, be that as it may, I decided to do some readability tests of various bits and pieces of the sats paper. And this is what I discovered. (texts listed in order of alleged difficulty)
The Lost Queen first paragraph: 13-15 year olds
Wild Ride first paragraph: 13-15 year olds
Wild Ride ‘bewildered’ paragraph 18-22 year olds
Way of the Dodo first paragraph 13-15 year olds (and lower score than The Lost Queen)
Way of the Dodo 2nd paragraph 13-15 year olds.
So there you have it, insofar as Flesch Kincaid has any reliability, the supposedly hardest text was in fact the easiest, the middle text was the hardest.
I did the same with the the sample paper. The first had a readability level of a 11-12 year-old and the second 13 – 15. I had lost the will to live by then so didn’t do the third text – but it is clearly much more demanding than the previous two – as it should be.
I also used the automated reading index and while this gave slightly different age ranges, the relative difficulty was the same and all the texts were for children older than 11, the easiest being… the first part of the way of the dodo.
However, it was also clear from my reading that readability tests are designed to help people writing, say pamphlets for the NHS, make the writing as transparent and easy as possible. In other words, they are intended to make reading simple so people who aren’t very good at it can understand stuff that may be very important. It struck me that maybe this wasn’t exactly what we should be aiming for in a reading assessment. After all, we do want some really challenging questions at some point. We just want them at the end, where they are meant to be. We need readability tests because previous generations have not been taught well enough to be presented with demanding information. We want better for the children we now teach.
Which brought me to discover this site, which ranks words by their relative frequency in the English language. If we are going to be held accountable for the sophistication of the vocabulary are children can comprehend, then surely there should be some bounds on that. While the authority of this is contested, it seems to be generally held that the average adult knows about 20,000 words. You can test yours here. How many words the average 11 year old does or should know I did not discover – so here are my ball park suggestions.
For the first text – the one that is meant to be easier – there should be a cap on words ranked occurring below 10,000. (I’m assuming here we understand that as words are used less frequently their ranking falls but the actual number rises: a ranking of 20,000 is lower than a ranking of 10,000. If this is not the correct convention for such matters, I apologise). Definitions should be given for low frequency words, especially if understanding them is critical to answering specific questions. In the same way in which Savannah was explained at the introduction to Wild Ride
Then in the second text words could be limited to 15,000, and in the third 20,000 – representing the average adult’s vocabulary. I have plucked these figures from the air. I would not go to the stake for them. But you get my meaning. We need to pin down the domain of ‘vocabulary’ if we are to be held accountable when it is tested.
For what it is worth, I asked our year 6 after the test to tell me which words they did not know. There are 30 children in the class. Words where half the class or more did not know the meaning included from the first text: monument, haze, weathered (as a verb); from the second text: jockey, dam, promptly, sedately (zero children), counselled, arthritic, nasal, pranced, skittishly (zero children), milled, bewildered, spindly, momentum; from the third list haven, oasis ( they knew this was a brand of drink though), parched, receding, rehabilitate and anatomy. My Geordie partner tells me they would have known parched if they were northern because that’s Geordie for ‘I’m really thirsty.’ Here we can see again that the middle passage had the highest number of unknown words in my obviously unrepresentative sample. In fact, it was the first paragraph on page 8, which henceforth shall be known as the bewildering paragraph that seemed to have the highest lexico-grammatical density. As the mud flats entrapped the Mauritian dodos, so did this paragraph ensnare our readers, slowing them down to the extent that they failed to finish the questions pertaining to the relatively easy final text.
Maybe I’m wrong. Maybe when the statistics are finally in, there won’t be a starker-than-usual demarcation along class lines. I’d love to be wrong. Let’s hope I am.
And finally, what you’ve all been waiting for – what was the lowest ranking word? Well yes of course, it was ‘skittishly‘; so rare it doesn’t even appear in the data base of 60,000 words I was using. But suitable for 11 year olds, apparently.
In case you are interested, here’s the full rankings. Where the word might be more familiar as a different part of speech I have included a ranking for that word too, in italics. The words I chose to rank were just those my deputy and I thought children might find tricky.
Word
(organized by rank lowest to highest) |
Part of speech | Ranking | Text | Necessary (N)/useful (U) for question number |
skittishly | adverb | <60,000 | WR | |
parch | adjective | 46,169 | WD | E29 |
sedately | adverb | 38,421 | WR | |
clack | noun | 32,467 | TLQ | 4 distractor |
sedate | verb | 23,110 | ||
prance | verb | 22,360 | WR | |
skittish | adjective | 21,298 | ||
sedate | adjective | 20,481 | ||
arthritic | adjective | 20,107 | WR | |
mossy | adjective | 19,480 | TLQ | U8 distractor
E9 |
misjudge | verb | 19140 | WD | |
spindly | adjective | 19025 | WR | |
bewilderment | adverb | 17,410 | WR | E16 |
squeal | noun | 17,103 | WR | |
sternly | adverb | 16,117 | WR | |
plod | verb | 16,053 | WR | |
prey | verb | 15,771 | WD | |
dismount | verb | 15,601 | WR | |
hush | noun | 15,394 | TLQ | N4 |
burrow | noun | 14,900 | WR | |
hush | verb | 14,295 | ||
nocturnal | adjective | 13,755 | WR | E14 |
enraged | adjective | 13,378 | WR | |
mill | verb | 13,378 | WR | E16 |
sprint | noun | 12,187 | WR | |
squeal | verb | 12036 | ||
folklore | noun | 11,722 | WD | |
rehabilitate | verb | 11,496 | WD | E31 |
oasis | noun | 10,567 | WD | U29 |
stern | adjective | 10,377 | ||
moss | noun | 10142 | ||
evade | verb | 9759 | WR | |
sight | verb | 9730 | WD | |
jockey | noun | 9723 | WR | |
weather | verb | 9568 | TLQ | U8 |
blur | noun | 9319 | WR | |
murky | adjective | 9265 | TLQ | U6 |
inscription | noun | 9164 | TLQ | U8 |
rein | noun | 8793 | WR | |
sprint | verb | 8742 | ||
slaughter | noun | 8494 | WD | |
anatomy | noun | 8310 | WD | E32 |
haze | noun | 8,307 | TLQ | 4 distractor |
counsel | verb | 7905 | WR | |
nasal | adjective | 7857 | WR | |
recede | verb | 7809 | WD | |
intent | adjective | 7747 | WR | |
stubborn | adjective | 7680 | WR question | E15 |
slab | noun | 7585 | TLQ | U8 |
arthritis | noun | 7,498 | WR | |
promptly | adverb | 6762 | WR | |
blur | verb | 6451 | ||
haven | noun | 5770 | WD | |
vine | noun | 5746 | TLQ | U6 |
defy | verb | 5648 | WR | E15 |
startle | verb | 5517 | WR | |
drought | noun | 5413 | WD | |
remains | noun | 5375 | WD | |
devastating | adjective | 4885 | WD | |
rehabilitation | noun | 4842 | ||
prey | noun | 4533 | ||
dam | noun | 4438 | WR | |
momentum | noun | 4400 | WR | E18 |
ancestor | noun | 4178 | TLQ | N1. |
monument | noun | 4106 | TLQ | U8 |
dawn | noun | 4044 | WR | N12a |
intent | noun | 3992 | ||
click | noun | 3822 | ||
counsel | noun | 3441 | ||
indication | noun | 3401 | WD | |
prompt | adjective | 3142 | ||
mount | verb | 3012 | ||
urge | verb | 2281 | WR | |
cast | verb | 2052 | WR | |
judge | verb | 1764 | ||
unique | adjective | 1735 | WD | E25 |
weather | noun | 1623 | ||
sight | noun | 1623 |
Word (organized by where they appear in the texts) | Part of speech | Ranking | Text | Necessary (N)/useful (U) for question number |
monument | noun | 4106 | TLQ | U8 |
ancestor | noun | 4178 | TLQ | N1 |
clack | noun | 32,467 | TLQ | 4 distractor |
hush | noun | 15,394 | TLQ | N4 |
hush | verb | 14,295 | ||
haze | noun | 8,307 | TLQ | 4 distractor |
vine | noun | 5746 | TLQ | U6 |
murky | adjective | 9265 | TLQ | U6 |
weather | verb | 9568 | TLQ | U8 distractor
E9 |
weather | noun | 1623 | ||
mossy | adjective | 19,480 | TLQ | U8 distractor
E9 |
moss | noun | 10142 | ||
inscription | noun | 9164 | TLQ | U8 |
slab | noun | 7585 | TLQ | U8 |
dawn | noun | 4044 | WR | N12a |
cast | verb | 2052 | WR | |
jockey | noun | 9723 | WR | |
dam | noun | 4438 | WR | |
startle | verb | 5517 | WR | |
nocturnal | adjective | 13,755 | WR | E14 |
promptly | adverb | 6762 | WR | |
prompt | adjective | 3142 | ||
stubborn | adjective | 7680 | WR question | E15 |
defy | verb | 5648 | WR | |
sedately | adverb | 38,421 | WR | |
sedate | verb | 23,110 | ||
sedate | adjective | 20,481 | ||
plod | verb | 16,053 | WR | |
arthritic | adjective | 20,107 | WR | |
arthritis | noun | 7,498 | WR | |
nasal | adjective | 7857 | WR | |
squeal | noun | 17,103 | WR | |
squeal | verb | 12036 | ||
burrow | noun | 14,900 | WR | |
prance | verb | 22,360 | WR | |
skittishly | adverb | <60,000 | WR | |
intent | adjective | 7747 | WR | |
intent | noun | 3992 | ||
enraged | adjective | 13,378 | WR | |
squeal | noun | 17,103 | WR | |
squeal | verb | 12036 | ||
mill | verb | 13,378 | WR | E16 |
bewilderment | adverb | 17,410 | WR | E16 |
spindly | adjective | 19025 | WR | |
evade | verb | 9759 | WR | |
momentum | noun | 4400 | WR | E18 |
urge | verb | 2281 | WR | |
sprint | verb | 8742 | ||
sprint | noun | 12,187 | WR | |
blur | noun | 9319 | WR | |
blur | verb | 6451 | ||
dismount | verb | 15,601 | WR | |
mount | verb | 3012 | ||
sight | verb | 9730 | WD | |
sight | noun | 1623 | ||
haven | noun | 5770 | WD | |
slaughter | noun | 8494 | WD | |
unique | adjective | 1735 | WD | E25 |
prey | verb | 15,771 | WD | |
prey | noun | 4533 | ||
folklore | noun | 11,722 | WD | |
remains | noun | 5375 | WD | |
drought | noun | 5413 | WD | |
oasis | noun | 10,567 | WD | U29 |
parch | adjective | 46,169 | WD | E29 |
recede | verb | 7809 | WD | |
rehabilitate | verb | 11,496 | WD | E31 |
indication | noun | 3401 | WD | |
anatomy | noun | 8310 | WD | E32 |
misjudge | verb | 19140 | WD | |
judge | verb | 1764 | ||
devastating | adjective | 4885 | WD |
By way of contrast I did the same with the sample text. In the first text there were no words I thought were hard enough to check. In the second there were 4: cover (15,363), pitiful (13,211), brittle (10,462) and emerald (12,749). In the third and final passage there were 8:triumphantly (16,3,43), glade (20,257), unwieldy (16,922), sapling (16,313, foliage 7,465, lurch (9339), ecstasy (9629) and finally, ranking off the scale below 60,000 gambols.