Milling around in bewilderment: that reading comprehension.

You can see the reading test for yourself from this link

The day started well; dawn casting spun-gold threads across a rosy sky.  The long wait was over; sats week was finally here.  And it looked like summer had arrived. Year 6 tripped in to classrooms  while head teachers fumbled skittishly with secret keys in hidden cupboards.  Eventually teachers across the nation ripped open plastic packets.Perhaps at first their fears were calmed, for the text – or what you can glean about it from reading snippets here and there as you patrol the rows – didn’t seem too bad. In previous weeks children had struggled with excerpts from the Lady of Shalott, Moonfleet, Shakespeare.  The language here looked far more contemporary.

But no. Upon completion children declared the test was hard – really hard. Many hadn’t finished – including  children who usually tore through tests like a…white giraffe? What is more, the texts didn’t seem to be in any kind of order. We had drilled into them, as per the test specification guide, that the texts would increase in difficulty throughout the paper (section 6.2) Yet the middle text was almost universally found to be the hardest.  Some declared the final text the easiest. What was going on?

Tests safely dispatched, I decided to take a proper look. It didn’t take long for it to be apparent that the  texts contained demanding vocabulary, and some tortuous sentence structure. The difference with the sample test material was stark. Twitter was alive with tales on sobbing kids, and angry teachers. Someone said they had analysed the first paragraph of the first text and it came out with a reading age of 15. Debate followed; was this really true or just a rumour? Were readability tests reliable? I tweeted that it was a test of how middle class and literary one’s parents were, having identified 45 words I reckoned might challenge our inner city children.  After all, as a colleague remarked, ‘my three-year-old knows more words than some children here’. Other people drew groans by mentioning how irrelevant the texts were to the kind of lives their children lived. I seemed to be implicated in this criticism…although it’s difficult to tell who’s criticising who sometimes on Twitter. Still, I was put out. I don’t care if texts are ‘relevant’, I retorted. I cared that the vocabulary needed to answer questions  favoured a ‘posh demographic.  Apparently, this was patronising. I saw red at this point! It’s not that poorer children can’t acquire a rich vocabulary but that since it is well known that a rich vocabulary is linked to parental income and the domain ‘rich vocabulary ‘ is huge (and undefined), it is not fair or useful to use tests that rely on good vocabulary for accountability.  And then I put a link to this previous post of mine, where I’ve explained this in more depth. If accountability tests over-rely on assessing vocabulary as a proxy for assessing reading, this hands a free pass to school choc full of children like my colleague’s three-year-old, since such children arrive already stuffed to the gills with eloquence and articulacy. Whereas the poorer the intake the greater the uphill struggle to enable the acquisition of the kind of  cultural capital richer children imbibe with their mother’s milk.

Flawed as the previous reading tests were, they did not stack the cards against  schools serving language-poor populations. The trouble with using vocabulary as a measure is that it that each individual word is so specific. Usually what we teach is generalisable from one context to another. Learning words however has to be done on a case by case basis. I recently taught year 6 somnolent, distraught and clandestine, among many others. I love teaching children new words, and they love acquiring them.  But unless there is some sort of finite list against which we are to be judged, I’d rather not have our school judged by a test that is hard to pass without an expansive and sophisticated vocabulary. With the maths and SPAG tests, we know exactly what is going to be tested. The domain is finite. We worry about how to teach it so it is understood and remembered, but we do not worry that some arcane bit of maths will worm its way into the test.  Nautical miles, for example.  Not so with reading. Any word within the English language is fair game – including several that don’t regularly appear in the vocabulary of the average adult. There may be very good reasons for the government to want to ascertain the breadth of vocabulary acquisition across the nation. In which case, they could instigate a vocabulary test – maybe something along the lines of this.  But that shouldn’t be confused with the ability to read. To return to our earlier example, my colleagues three-year-old may have an impressive vocabulary but she can’t actually read much  at all yet. Whereas our 11-year-olds may not know as many words but are happily enjoying reading  the Morris Glietzman ‘Once’ series.

It is becoming accepted that  reading is not just the orchestration of a set of skills, but requires knowledge of the context for the reader to make sense of the bigger picture.  But that’s not what happened here.  It’s not the case that children found the texts difficult because they lacked knowledge of the context. The context of the first text was two children exploring outdoors. True only 50% of our present year 6 knew what a monument was at the outset – a bit tricky since this was pretty central to the test – but by the end of the story they sort of worked it out for themselves. The second text featured a  young girl disobeying her grandmother and taking risks. And a giraffe. Well I reckon this is pretty familiar territory (grandmothers and risks, I mean) and while we do not meet giraffes everyday in Bethnal Green, we know what they are.  The third and final text told us all about dodos and how they may have been unfairly maligned by Victorian scientists. So that was a bit more remote from every day experience but no so terribly outlandish as to render the text impenetrable. The third text is meant to be harder. The children are meant to have studied evolution and extinction by then in science anyway.   So it wasn’t that the Sitz im Leben was so abstruse as to render comprehension impossible. The problem was the words used within the texts and the high number of questions which were dependent upon knowing what those words meant. The  rather convoluted sentence structure in the  second text didn’t help either – but if the words had been more familiar, children might have stood more of a fighting chance.

According to the test specification, questions can be difficult in one of five different ways. These five ways are based on research commissioned by the PISA guys. It’s an interesting  and informative read – so I’m not arguing with the methodology per se.  I don’t know nearly enough to even attempt that. Amateur though I am, I do argue with the relative proportions allocated to each of the five strategies in this test.

With three of these, I have no quarrel. Firstly,  ( and my ordering is different from that in the document)  questions can be made easier or harder in terms of accessibility; how easy is it to find the information? Is the student signposted to it (e.g. see the first paragraph on page 2).   Or is the question difficulty raised by not signposting and possibly by having distractor items to lure students down dead ends?  I think we have little to complain about here. e.g. question 30 has clear signposting…’Look at the paragraph beginning:  Then, in 2005…’ whereas  in question 33  the relevant information is much harder to find – it’s a ‘match the summary of the paragraph to the order in which they occur’ question.

Secondly, questions may vary in terms of task-specific complexity. How much work does the student have to do to answer the question?  Is it a simple information retrieval task or does the pupil have to use inference?

For example, question 7 is easy in this regards.’ Write down three things you are told about the oak tree.’   The text clearly says the oak tree was ‘ancient’.  I haven’t checked the mark scheme as it’s not yet published as I write, but I am assuming that’s enough to earn you 1 mark. Whereas question 3 is a bit harder. ‘How can you tell that Maria was very keen to get to the island?  Students need to infer  this from the fact that she she said  something ‘impatiently’.   There are far fewer of this kind of question under this new regime – but we were expecting that and the sample paper demonstrated that. Again – no complaints.  Indeed the test specification does  share the relative weightings of different skills (in section 6.2.2, table 9), but the bands are so wide its all a bit meaningless. Inference questions can make up between 16% and 50% of all questions, for example.

Thirdly, the response strategy can be more or less demanding, a one word answer versus a three-marker explain your opinion question.

The two final ways to make questions more or less difficult are by either varying the extent of knowledge of vocabulary required by the question (strategy 5 in the specification document) or by varying  the complexity of the target information that is needed to answer the question.  (Strategy 2) The document goes on to explain that this means by varying

• the lexico-grammatical density of the stimulus

• the level of concreteness / abstractness of the target information

• the level of familiarity of the information needed to answer the question

and that …’There is a low level of semantic match between task wording and relevant information in the text.’

I’m not quite sure what the difference is between ‘lexico-grammatical density’  (strategy 2) and knowledge of vocabulary required by the question (strategy 5), but  the whole thrust of this piece is that texts were pretty dense lexico-grammatically and in terms of vocabulary needed to answer the questions. When compared with the sample test for example, the contrast is stark. Now I’m no expert in linguistics or test question methodology. I’m just a headteacher with an axe to grind, a weekend to waste and access to google.  But this has infuriated me enough to do a fair bit of reading around the subject.

On the Monday evening post test, twitter was alive with people quoting someone who apparently had said that the first paragraph of the first text had  a Flesch Kincaid reading ease equivalent to that of a 15 year old. I’d never heard of Flesch Kincaid – or any other of the readability tests – so I did some research and found out that indeed, the first paragraph of The Lost Queen was described as suitable for 8th-9th graders – or 13-15 year olds in the British system. But there was also criticism online that the readability tests  rated the same texts quite differently so weren’t a reliable indicator of much. (Someone put a link up to an article about this, which I foolishly forgot to bookmark and now can’t find – do share the link again if it was you or you know a good source.)*

Anyway, be that as it may, I decided to do some readability tests of various bits and pieces of the sats paper.  And this is what I discovered. (texts listed in order of alleged difficulty)

The Lost Queen first paragraph:  13-15 year olds

Wild Ride first paragraph:              13-15 year olds

Wild Ride ‘bewildered’ paragraph     18-22 year olds

Way of the Dodo first paragraph          13-15 year olds  (and lower score than The Lost Queen)

Way of the Dodo 2nd paragraph         13-15 year olds.

So there you have it, insofar as Flesch Kincaid has any reliability, the supposedly hardest text was in fact the easiest, the middle text was the hardest.

I did the same with the the sample paper. The first had a readability level of a 11-12 year-old and the second 13 – 15. I had lost the will to live by then so didn’t do the third text – but it is clearly much more demanding than the previous two  – as it should be.

I also used the automated reading index and while this gave slightly different age ranges, the relative difficulty was the same and all the texts were for children older than 11, the easiest being… the first part of the way of the dodo.

However,  it was also clear from my reading that readability tests are designed to help people writing, say pamphlets for the NHS, make the writing as transparent and easy as possible. In other words, they are intended to make reading simple so people who aren’t very good at it can understand stuff that may be very important. It struck me that maybe this wasn’t exactly what we should be aiming for in a reading assessment. After all, we do want some really challenging questions at some point. We just want them  at the end, where they are meant to be. We need readability tests because previous generations have not been taught well enough to be presented with demanding information. We want better for the children we now teach.

Which brought me to discover this site, which ranks words by their relative frequency in the English language.  If we are going to be held accountable for the sophistication of the vocabulary are children can comprehend, then surely there should be some bounds on that.  While the authority of this is contested, it seems to be generally held that the average adult knows about 20,000 words. You can test yours here.   How many words the average 11 year old does or should know I did not discover – so here are my ball park suggestions.

For the first text – the one that is meant to be easier – there should be a cap on words ranked occurring below 10,000. (I’m assuming here we understand that as words are used less frequently their ranking falls but the actual number rises: a ranking of 20,000 is lower than a ranking of 10,000. If this is not the correct convention for such matters, I apologise). Definitions should be given for low frequency words, especially if understanding them is critical to answering specific questions. In the same way in which Savannah was explained at the introduction to Wild Ride

Then in the second text words could be limited to 15,000, and in the third 20,000 – representing the average adult’s vocabulary. I have plucked these figures from the air. I would not go to the stake for them. But you get my meaning. We need to pin down the domain of ‘vocabulary’ if we are to be held accountable when it is tested.

For what it is worth, I asked our year 6 after the test to tell me which words they did not know. There are 30 children in the class. Words where half the class or more did not know the meaning included  from the first text: monument, haze, weathered (as a verb); from the second text: jockey, dam, promptly, sedately (zero children), counselled, arthritic, nasal, pranced, skittishly (zero children), milled, bewildered, spindly, momentum; from the third list haven, oasis ( they knew this was a brand of drink though), parched, receding, rehabilitate and anatomy. My Geordie partner tells me they would have known parched if they were northern because that’s Geordie for ‘I’m really thirsty.’  Here we can see again that the middle passage had the highest number of unknown words in my obviously unrepresentative sample. In fact, it was the first paragraph on page 8, which henceforth shall be known as the bewildering paragraph that seemed to have the highest lexico-grammatical density.  As the mud flats entrapped the Mauritian dodos, so did this paragraph ensnare our readers, slowing them down to the extent that they failed to finish the questions pertaining to the relatively easy  final text.

Maybe I’m wrong. Maybe when the statistics are finally in, there won’t be a starker-than-usual demarcation along class lines. I’d love to be wrong. Let’s hope I am.

And finally, what you’ve all been waiting for – what was the lowest ranking word?  Well yes of course, it was ‘skittishly‘;  so rare it doesn’t even appear in the data base of 60,000 words I was using. But suitable for 11 year olds, apparently.

In case you are interested, here’s the full rankings. Where the word might be more familiar as a different part of speech I have included a ranking for that word too, in italics. The words I chose to rank were just those my deputy and I thought children might find tricky.

Word

(organized by rank lowest to highest)

Part  of speech Ranking Text Necessary (N)/useful  (U) for question number
skittishly adverb <60,000 WR
parch adjective 46,169 WD E29
sedately adverb 38,421 WR
clack noun 32,467 TLQ 4 distractor
sedate verb 23,110
prance verb 22,360 WR
skittish adjective 21,298
sedate adjective 20,481
arthritic adjective 20,107 WR
mossy adjective 19,480 TLQ U8 distractor

E9

misjudge verb 19140 WD
spindly adjective 19025 WR
bewilderment adverb 17,410 WR E16
squeal noun 17,103 WR
sternly adverb 16,117 WR
plod verb 16,053 WR
prey verb 15,771 WD
dismount verb 15,601 WR
hush noun 15,394 TLQ N4
burrow noun 14,900 WR
hush verb 14,295
nocturnal adjective 13,755 WR E14
enraged adjective 13,378 WR
mill verb 13,378 WR E16
sprint noun 12,187 WR
squeal verb 12036
folklore noun 11,722 WD
rehabilitate verb 11,496 WD E31
oasis noun 10,567 WD U29
stern adjective 10,377
moss noun 10142
evade verb 9759 WR
sight verb 9730 WD
jockey noun 9723 WR
weather verb 9568 TLQ U8
blur noun 9319 WR
murky adjective 9265 TLQ U6
inscription noun 9164 TLQ U8
rein noun 8793 WR
sprint verb 8742
slaughter noun 8494 WD
anatomy noun 8310 WD E32
haze noun 8,307 TLQ 4 distractor
counsel verb 7905 WR
nasal adjective 7857 WR
recede verb 7809 WD
intent adjective 7747 WR
stubborn adjective 7680 WR question E15
slab noun 7585 TLQ U8
arthritis noun 7,498 WR
promptly adverb 6762 WR
blur verb 6451
haven noun 5770 WD
vine noun 5746 TLQ U6
defy verb 5648 WR E15
startle verb 5517 WR
drought noun 5413 WD
remains noun 5375 WD
devastating adjective 4885 WD
rehabilitation noun 4842
prey noun 4533
dam noun 4438 WR
momentum noun 4400 WR E18
ancestor noun 4178 TLQ N1.
monument noun 4106 TLQ U8
dawn noun 4044 WR N12a
intent noun 3992
click noun 3822
counsel noun 3441
indication noun 3401 WD
prompt adjective 3142
mount verb 3012
urge verb 2281 WR
cast verb 2052 WR
judge verb 1764
unique adjective 1735 WD E25
weather noun 1623
sight noun 1623
Word (organized by where they appear in the texts) Part  of speech Ranking Text Necessary (N)/useful (U) for question number
monument noun 4106 TLQ U8
ancestor noun 4178 TLQ N1
clack noun 32,467 TLQ 4 distractor
hush noun 15,394 TLQ N4
hush verb 14,295
haze noun 8,307 TLQ 4 distractor
vine noun 5746 TLQ U6
murky adjective 9265 TLQ U6
weather verb 9568 TLQ U8 distractor

E9

weather noun 1623
mossy adjective 19,480 TLQ U8 distractor

E9

moss noun 10142
inscription noun 9164 TLQ U8
slab noun 7585 TLQ U8
dawn noun 4044 WR N12a
cast verb 2052 WR
jockey noun 9723 WR
dam noun 4438 WR
startle verb 5517 WR
nocturnal adjective 13,755 WR E14
promptly adverb 6762 WR
prompt adjective 3142
stubborn adjective 7680 WR question E15
defy verb 5648 WR
sedately adverb 38,421 WR
sedate verb 23,110
sedate adjective 20,481
plod verb 16,053 WR
arthritic adjective 20,107 WR
arthritis noun 7,498 WR
nasal adjective 7857 WR
squeal noun 17,103 WR
squeal verb 12036
burrow noun 14,900 WR
prance verb 22,360 WR
skittishly adverb <60,000 WR
intent adjective 7747 WR
intent noun 3992
enraged adjective 13,378 WR
squeal noun 17,103 WR
squeal verb 12036
mill verb 13,378 WR E16
bewilderment adverb 17,410 WR E16
spindly adjective 19025 WR
evade verb 9759 WR
momentum noun 4400 WR E18
urge verb 2281 WR
sprint verb 8742
sprint noun 12,187 WR
blur noun 9319 WR
blur verb 6451
dismount verb 15,601 WR
mount verb 3012
sight verb 9730 WD
sight noun 1623
haven noun 5770 WD
slaughter noun 8494 WD
unique adjective 1735 WD E25
prey verb 15,771 WD
prey noun 4533
folklore noun 11,722 WD
remains noun 5375 WD
drought noun 5413 WD
oasis noun 10,567 WD U29
parch adjective 46,169 WD E29
recede verb 7809 WD
rehabilitate verb 11,496 WD E31
indication noun 3401 WD
anatomy noun 8310 WD E32
misjudge verb 19140 WD
judge verb 1764
devastating adjective 4885 WD

By way of contrast I did the same with the sample text. In the first text there were no words I thought were hard enough to check. In the second there were 4: cover (15,363), pitiful (13,211), brittle (10,462) and emerald (12,749).  In the third and final passage there were 8:triumphantly (16,3,43), glade (20,257), unwieldy (16,922), sapling (16,313, foliage 7,465, lurch (9339), ecstasy (9629) and finally, ranking off the scale below 60,000 gambols.

Milling around in bewilderment: that reading comprehension.

12 thoughts on “Milling around in bewilderment: that reading comprehension.

  1. Garry Minto says:

    Really interesting article. It looks as though the reading test was very poorly designed indeed, with little thought given to how children might approach the reading, and little research or even thought given to defining the knowledge domain. The Maths papers, with a few bits of weirdness, by and large at least had a clear knowledge domain. I need to check it, but I believe most newspapers are written for reading ages below 15. Michael Rosen has written lots about how norm referenced tests like SATs are designed to produce failure. For him, that is their purpose. The government, he says, needs some schools to fail in order to push through the academies programme. I’m not sure there is enough competence displayed in the test design here to support a conspiracy theory, and I do hope, like Clare that demographic analysis will show that kids from poorer backgrounds, with less access to a ‘rich vocabulary’, will not be greatly disadvantaged . I love the idea that some Geordie kids will still know ‘parched’. Oh, and in my day, ‘a dod’ is what we called Frank Clarke the then Newcastle United fullback.

    Like

  2. SPAG & Grammar says:

    Please sort the spelling error! “True only 50% of our present year 6 new what a monument was at the outset”

    Like

  3. @MrTRoach says:

    Loving your work. It was a nightmare of a test for my Year 6s too. The language-poor children really struggled, which includes those with EAL (although not all by any means) and poor White British.

    Couldn’t find you on Twitter, so here’s the link to the Daniel Willingham article on readability scales that I posted after the test.

    http://www.danielwillingham.com/daniel-willingham-science-and-education-blog/evaluating-readability-measures

    Doug Lemov also mentions some of the unreliability and odd results in Reading Reconsidered.

    Like

  4. […] There was also no mention of the reading test.  There were hints, as the words ‘more challenging‘ and ‘stretching‘ were emphasised. However, none of the tears experienced by year six children last year  were mentioned, nor was the huge hike in requirement for vocabulary. (OldPrimaryTimer has done some very interesting work on the readability of the texts here.) […]

    Like

Leave a comment