Milling around in bewilderment: that reading comprehension.

You can see the reading test for yourself from this link

The day started well; dawn casting spun-gold threads across a rosy sky. The long wait was over; sats week was finally here. And it looked like summer had arrived. Year 6 tripped in to classrooms while head teachers fumbled skittishly with secret keys in hidden cupboards. Eventually teachers across the nation ripped open plastic packets.Perhaps at first their fears were calmed, for the text – or what you can glean about it from reading snippets here and there as you patrol the rows – didn’t seem too bad. In previous weeks children had struggled with excerpts from the Lady of Shalott, Moonfleet, Shakespeare. The language here looked far more contemporary.

But no. Upon completion children declared the test was hard – really hard. Many hadn’t finished – including children who usually tore through tests like a…white giraffe? What is more, the texts didn’t seem to be in any kind of order. We had drilled into them, as per the test specification guide, that the texts would increase in difficulty throughout the paper (section 6.2) Yet the middle text was almost universally found to be the hardest. Some declared the final text the easiest. What was going on?

Tests safely dispatched, I decided to take a proper look. It didn’t take long for it to be apparent that the texts contained demanding vocabulary, and some tortuous sentence structure. The difference with the sample test material was stark. Twitter was alive with tales on sobbing kids, and angry teachers. Someone said they had analysed the first paragraph of the first text and it came out with a reading age of 15. Debate followed; was this really true or just a rumour? Were readability tests reliable? I tweeted that it was a test of how middle class and literary one’s parents were, having identified 45 words I reckoned might challenge our inner city children. After all, as a colleague remarked, ‘my three-year-old knows more words than some children here’. Other people drew groans by mentioning how irrelevant the texts were to the kind of lives their children lived. I seemed to be implicated in this criticism…although it’s difficult to tell who’s criticising who sometimes on Twitter. Still, I was put out. I don’t care if texts are ‘relevant’, I retorted. I cared that the vocabulary needed to answer questions favoured a ‘posh demographic. Apparently, this was patronising. I saw red at this point! It’s not that poorer children can’t acquire a rich vocabulary but that since it is well known that a rich vocabulary is linked to parental income and the domain ‘rich vocabulary ‘ is huge (and undefined), it is not fair or useful to use tests that rely on good vocabulary for accountability. And then I put a link to this previous post of mine, where I’ve explained this in more depth. If accountability tests over-rely on assessing vocabulary as a proxy for assessing reading, this hands a free pass to school choc full of children like my colleague’s three-year-old, since such children arrive already stuffed to the gills with eloquence and articulacy. Whereas the poorer the intake the greater the uphill struggle to enable the acquisition of the kind of cultural capital richer children imbibe with their mother’s milk.

Flawed as the previous reading tests were, they did not stack the cards against schools serving language-poor populations. The trouble with using vocabulary as a measure is that it that each individual word is so specific. Usually what we teach is generalisable from one context to another. Learning words however has to be done on a case by case basis. I recently taught year 6 somnolent, distraught and clandestine, among many others. I love teaching children new words, and they love acquiring them. But unless there is some sort of finite list against which we are to be judged, I’d rather not have our school judged by a test that is hard to pass without an expansive and sophisticated vocabulary. With the maths and SPAG tests, we know exactly what is going to be tested. The domain is finite. We worry about how to teach it so it is understood and remembered, but we do not worry that some arcane bit of maths will worm its way into the test. Nautical miles, for example. Not so with reading. Any word within the English language is fair game – including several that don’t regularly appear in the vocabulary of the average adult. There may be very good reasons for the government to want to ascertain the breadth of vocabulary acquisition across the nation. In which case, they could instigate a vocabulary test – maybe something along the lines of this. But that shouldn’t be confused with the ability to read. To return to our earlier example, my colleagues three-year-old may have an impressive vocabulary but she can’t actually read much at all yet. Whereas our 11-year-olds may not know as many words but are happily enjoying reading the Morris Glietzman ‘Once’ series.

It is becoming accepted that reading is not just the orchestration of a set of skills, but requires knowledge of the context for the reader to make sense of the bigger picture. But that’s not what happened here. It’s not the case that children found the texts difficult because they lacked knowledge of the context. The context of the first text was two children exploring outdoors. True only 50% of our present year 6 knew what a monument was at the outset – a bit tricky since this was pretty central to the test – but by the end of the story they sort of worked it out for themselves. The second text featured a young girl disobeying her grandmother and taking risks. And a giraffe. Well I reckon this is pretty familiar territory (grandmothers and risks, I mean) and while we do not meet giraffes everyday in Bethnal Green, we know what they are. The third and final text told us all about dodos and how they may have been unfairly maligned by Victorian scientists. So that was a bit more remote from every day experience but no so terribly outlandish as to render the text impenetrable. The third text is meant to be harder. The children are meant to have studied evolution and extinction by then in science anyway. So it wasn’t that the Sitz im Leben was so abstruse as to render comprehension impossible. The problem was the words used within the texts and the high number of questions which were dependent upon knowing what those words meant. The rather convoluted sentence structure in the second text didn’t help either – but if the words had been more familiar, children might have stood more of a fighting chance.

According to the test specification, questions can be difficult in one of five different ways. These five ways are based on research commissioned by the PISA guys. It’s an interesting and informative read – so I’m not arguing with the methodology per se. I don’t know nearly enough to even attempt that. Amateur though I am, I do argue with the relative proportions allocated to each of the five strategies in this test.

With three of these, I have no quarrel. Firstly, ( and my ordering is different from that in the document) questions can be made easier or harder in terms of accessibility; how easy is it to find the information? Is the student signposted to it (e.g. see the first paragraph on page 2). Or is the question difficulty raised by not signposting and possibly by having distractor items to lure students down dead ends? I think we have little to complain about here. e.g. question 30 has clear signposting…’Look at the paragraph beginning: Then, in 2005…’ whereas in question 33 the relevant information is much harder to find – it’s a ‘match the summary of the paragraph to the order in which they occur’ question.

Secondly, questions may vary in terms of task-specific complexity. How much work does the student have to do to answer the question? Is it a simple information retrieval task or does the pupil have to use inference?

For example, question 7 is easy in this regards.’ Write down three things you are told about the oak tree.’ The text clearly says the oak tree was ‘ancient’. I haven’t checked the mark scheme as it’s not yet published as I write, but I am assuming that’s enough to earn you 1 mark. Whereas question 3 is a bit harder. ‘How can you tell that Maria was very keen to get to the island? Students need to infer this from the fact that she she said something ‘impatiently’. There are far fewer of this kind of question under this new regime – but we were expecting that and the sample paper demonstrated that. Again – no complaints. Indeed the test specification does share the relative weightings of different skills (in section 6.2.2, table 9), but the bands are so wide its all a bit meaningless. Inference questions can make up between 16% and 50% of all questions, for example.

Thirdly, the response strategy can be more or less demanding, a one word answer versus a three-marker explain your opinion question.

The two final ways to make questions more or less difficult are by either varying the extent of knowledge of vocabulary required by the question (strategy 5 in the specification document) or by varying the complexity of the target information that is needed to answer the question. (Strategy 2) The document goes on to explain that this means by varying

• the lexico-grammatical density of the stimulus

• the level of concreteness / abstractness of the target information

• the level of familiarity of the information needed to answer the question

and that …’There is a low level of semantic match between task wording and relevant information in the text.’

I’m not quite sure what the difference is between ‘lexico-grammatical density’ (strategy 2) and knowledge of vocabulary required by the question (strategy 5), but the whole thrust of this piece is that texts were pretty dense lexico-grammatically and in terms of vocabulary needed to answer the questions. When compared with the sample test for example, the contrast is stark. Now I’m no expert in linguistics or test question methodology. I’m just a headteacher with an axe to grind, a weekend to waste and access to google. But this has infuriated me enough to do a fair bit of reading around the subject.

On the Monday evening post test, twitter was alive with people quoting someone who apparently had said that the first paragraph of the first text had a Flesch Kincaid reading ease equivalent to that of a 15 year old. I’d never heard of Flesch Kincaid – or any other of the readability tests – so I did some research and found out that indeed, the first paragraph of The Lost Queen was described as suitable for 8th-9th graders – or 13-15 year olds in the British system. But there was also criticism online that the readability tests rated the same texts quite differently so weren’t a reliable indicator of much. (Someone put a link up to an article about this, which I foolishly forgot to bookmark and now can’t find – do share the link again if it was you or you know a good source.)*

Anyway, be that as it may, I decided to do some readability tests of various bits and pieces of the sats paper. And this is what I discovered. (texts listed in order of alleged difficulty)

The Lost Queen first paragraph: 13-15 year olds

Wild Ride first paragraph: 13-15 year olds

Wild Ride ‘bewildered’ paragraph 18-22 year olds

Way of the Dodo first paragraph 13-15 year olds (and lower score than The Lost Queen)

Way of the Dodo 2nd paragraph 13-15 year olds.

So there you have it, insofar as Flesch Kincaid has any reliability, the supposedly hardest text was in fact the easiest, the middle text was the hardest.

I did the same with the the sample paper. The first had a readability level of a 11-12 year-old and the second 13 – 15. I had lost the will to live by then so didn’t do the third text – but it is clearly much more demanding than the previous two – as it should be.

I also used the automated reading index and while this gave slightly different age ranges, the relative difficulty was the same and all the texts were for children older than 11, the easiest being… the first part of the way of the dodo.

However, it was also clear from my reading that readability tests are designed to help people writing, say pamphlets for the NHS, make the writing as transparent and easy as possible. In other words, they are intended to make reading simple so people who aren’t very good at it can understand stuff that may be very important. It struck me that maybe this wasn’t exactly what we should be aiming for in a reading assessment. After all, we do want some really challenging questions at some point. We just want them at the end, where they are meant to be. We need readability tests because previous generations have not been taught well enough to be presented with demanding information. We want better for the children we now teach.

Which brought me to discover this site, which ranks words by their relative frequency in the English language. If we are going to be held accountable for the sophistication of the vocabulary are children can comprehend, then surely there should be some bounds on that. While the authority of this is contested, it seems to be generally held that the average adult knows about 20,000 words. You can test yours here. How many words the average 11 year old does or should know I did not discover – so here are my ball park suggestions.

For the first text – the one that is meant to be easier – there should be a cap on words ranked occurring below 10,000. (I’m assuming here we understand that as words are used less frequently their ranking falls but the actual number rises: a ranking of 20,000 is lower than a ranking of 10,000. If this is not the correct convention for such matters, I apologise). Definitions should be given for low frequency words, especially if understanding them is critical to answering specific questions. In the same way in which Savannah was explained at the introduction to Wild Ride

Then in the second text words could be limited to 15,000, and in the third 20,000 – representing the average adult’s vocabulary. I have plucked these figures from the air. I would not go to the stake for them. But you get my meaning. We need to pin down the domain of ‘vocabulary’ if we are to be held accountable when it is tested.

For what it is worth, I asked our year 6 after the test to tell me which words they did not know. There are 30 children in the class. Words where half the class or more did not know the meaning included from the first text: monument, haze, weathered (as a verb); from the second text: jockey, dam, promptly, sedately (zero children), counselled, arthritic, nasal, pranced, skittishly (zero children), milled, bewildered, spindly, momentum; from the third list haven, oasis ( they knew this was a brand of drink though), parched, receding, rehabilitate and anatomy. My Geordie partner tells me they would have known parched if they were northern because that’s Geordie for ‘I’m really thirsty.’ Here we can see again that the middle passage had the highest number of unknown words in my obviously unrepresentative sample. In fact, it was the first paragraph on page 8, which henceforth shall be known as the bewildering paragraph that seemed to have the highest lexico-grammatical density. As the mud flats entrapped the Mauritian dodos, so did this paragraph ensnare our readers, slowing them down to the extent that they failed to finish the questions pertaining to the relatively easy final text.

Maybe I’m wrong. Maybe when the statistics are finally in, there won’t be a starker-than-usual demarcation along class lines. I’d love to be wrong. Let’s hope I am.

And finally, what you’ve all been waiting for – what was the lowest ranking word? Well yes of course, it was ‘skittishly‘; so rare it doesn’t even appear in the data base of 60,000 words I was using. But suitable for 11 year olds, apparently.

In case you are interested, here’s the full rankings. Where the word might be more familiar as a different part of speech I have included a ranking for that word too, in italics. The words I chose to rank were just those my deputy and I thought children might find tricky.

Word (organized by rank lowest to highest)	Part of speech	Ranking	Text	Necessary (N)/useful (U) for question number
skittishly	adverb	<60,000	WR
parch	adjective	46,169	WD	E29
sedately	adverb	38,421	WR
clack	noun	32,467	TLQ	4 distractor
sedate	verb	23,110
prance	verb	22,360	WR
skittish	adjective	21,298
sedate	adjective	20,481
arthritic	adjective	20,107	WR
mossy	adjective	19,480	TLQ	U8 distractor E9
misjudge	verb	19140	WD
spindly	adjective	19025	WR
bewilderment	adverb	17,410	WR	E16
squeal	noun	17,103	WR
sternly	adverb	16,117	WR
plod	verb	16,053	WR
prey	verb	15,771	WD
dismount	verb	15,601	WR
hush	noun	15,394	TLQ	N4
burrow	noun	14,900	WR
hush	verb	14,295
nocturnal	adjective	13,755	WR	E14
enraged	adjective	13,378	WR
mill	verb	13,378	WR	E16
sprint	noun	12,187	WR
squeal	verb	12036
folklore	noun	11,722	WD
rehabilitate	verb	11,496	WD	E31
oasis	noun	10,567	WD	U29
stern	adjective	10,377
moss	noun	10142
evade	verb	9759	WR
sight	verb	9730	WD
jockey	noun	9723	WR
weather	verb	9568	TLQ	U8
blur	noun	9319	WR
murky	adjective	9265	TLQ	U6
inscription	noun	9164	TLQ	U8
rein	noun	8793	WR
sprint	verb	8742
slaughter	noun	8494	WD
anatomy	noun	8310	WD	E32
haze	noun	8,307	TLQ	4 distractor
counsel	verb	7905	WR
nasal	adjective	7857	WR
recede	verb	7809	WD
intent	adjective	7747	WR
stubborn	adjective	7680	WR question	E15
slab	noun	7585	TLQ	U8
arthritis	noun	7,498	WR
promptly	adverb	6762	WR
blur	verb	6451
haven	noun	5770	WD
vine	noun	5746	TLQ	U6
defy	verb	5648	WR	E15
startle	verb	5517	WR
drought	noun	5413	WD
remains	noun	5375	WD
devastating	adjective	4885	WD
rehabilitation	noun	4842
prey	noun	4533
dam	noun	4438	WR
momentum	noun	4400	WR	E18
ancestor	noun	4178	TLQ	N1.
monument	noun	4106	TLQ	U8
dawn	noun	4044	WR	N12a
intent	noun	3992
click	noun	3822
counsel	noun	3441
indication	noun	3401	WD
prompt	adjective	3142
mount	verb	3012
urge	verb	2281	WR
cast	verb	2052	WR
judge	verb	1764
unique	adjective	1735	WD	E25
weather	noun	1623
sight	noun	1623

Word (organized by where they appear in the texts)	Part of speech	Ranking	Text	Necessary (N)/useful (U) for question number
monument	noun	4106	TLQ	U8
ancestor	noun	4178	TLQ	N1
clack	noun	32,467	TLQ	4 distractor
hush	noun	15,394	TLQ	N4
hush	verb	14,295
haze	noun	8,307	TLQ	4 distractor
vine	noun	5746	TLQ	U6
murky	adjective	9265	TLQ	U6
weather	verb	9568	TLQ	U8 distractor E9
weather	noun	1623
mossy	adjective	19,480	TLQ	U8 distractor E9
moss	noun	10142
inscription	noun	9164	TLQ	U8
slab	noun	7585	TLQ	U8
dawn	noun	4044	WR	N12a
cast	verb	2052	WR
jockey	noun	9723	WR
dam	noun	4438	WR
startle	verb	5517	WR
nocturnal	adjective	13,755	WR	E14
promptly	adverb	6762	WR
prompt	adjective	3142
stubborn	adjective	7680	WR question	E15
defy	verb	5648	WR
sedately	adverb	38,421	WR
sedate	verb	23,110
sedate	adjective	20,481
plod	verb	16,053	WR
arthritic	adjective	20,107	WR
arthritis	noun	7,498	WR
nasal	adjective	7857	WR
squeal	noun	17,103	WR
squeal	verb	12036
burrow	noun	14,900	WR
prance	verb	22,360	WR
skittishly	adverb	<60,000	WR
intent	adjective	7747	WR
intent	noun	3992
enraged	adjective	13,378	WR
squeal	noun	17,103	WR
squeal	verb	12036
mill	verb	13,378	WR	E16
bewilderment	adverb	17,410	WR	E16
spindly	adjective	19025	WR
evade	verb	9759	WR
momentum	noun	4400	WR	E18
urge	verb	2281	WR
sprint	verb	8742
sprint	noun	12,187	WR
blur	noun	9319	WR
blur	verb	6451
dismount	verb	15,601	WR
mount	verb	3012
sight	verb	9730	WD
sight	noun	1623
haven	noun	5770	WD
slaughter	noun	8494	WD
unique	adjective	1735	WD	E25
prey	verb	15,771	WD
prey	noun	4533
folklore	noun	11,722	WD
remains	noun	5375	WD
drought	noun	5413	WD
oasis	noun	10,567	WD	U29
parch	adjective	46,169	WD	E29
recede	verb	7809	WD
rehabilitate	verb	11,496	WD	E31
indication	noun	3401	WD
anatomy	noun	8310	WD	E32
misjudge	verb	19140	WD
judge	verb	1764
devastating	adjective	4885	WD

By way of contrast I did the same with the sample text. In the first text there were no words I thought were hard enough to check. In the second there were 4: cover (15,363), pitiful (13,211), brittle (10,462) and emerald (12,749). In the third and final passage there were 8:triumphantly (16,3,43), glade (20,257), unwieldy (16,922), sapling (16,313, foliage 7,465, lurch (9339), ecstasy (9629) and finally, ranking off the scale below 60,000 gambols.

12 thoughts on “Milling around in bewilderment: that reading comprehension.”

Garry Minto says:

Really interesting article. It looks as though the reading test was very poorly designed indeed, with little thought given to how children might approach the reading, and little research or even thought given to defining the knowledge domain. The Maths papers, with a few bits of weirdness, by and large at least had a clear knowledge domain. I need to check it, but I believe most newspapers are written for reading ages below 15. Michael Rosen has written lots about how norm referenced tests like SATs are designed to produce failure. For him, that is their purpose. The government, he says, needs some schools to fail in order to push through the academies programme. I’m not sure there is enough competence displayed in the test design here to support a conspiracy theory, and I do hope, like Clare that demographic analysis will show that kids from poorer backgrounds, with less access to a ‘rich vocabulary’, will not be greatly disadvantaged . I love the idea that some Geordie kids will still know ‘parched’. Oh, and in my day, ‘a dod’ is what we called Frank Clarke the then Newcastle United fullback.

LikeLike

May 20, 2016 at 7:26 pm Reply
squirrelclass says:

This is mind-blowing… and so very needed to prove that what the children were asked to do was just plain unrealistic. Thank you.

Here it is, from the children’s perspective- http://squirrelclass.edublogs.org/2016/05/20/dear-nicky-morgan/

LikeLike

May 20, 2016 at 7:40 pm Reply
1. oldprimarytimer says:
  
  Just read your yr 6’s letter to NiMo – good going! Have put a comment for them.
  
  LikeLike
  
  May 21, 2016 at 11:31 am Reply
SPAG & Grammar says:

Please sort the spelling error! “True only 50% of our present year 6 new what a monument was at the outset”

LikeLike

May 20, 2016 at 7:42 pm Reply
1. oldprimarytimer says:
  
  *blushes* Thanks – done!
  
  LikeLike
  
  May 21, 2016 at 11:25 am Reply
@MrTRoach says:

Loving your work. It was a nightmare of a test for my Year 6s too. The language-poor children really struggled, which includes those with EAL (although not all by any means) and poor White British.

Couldn’t find you on Twitter, so here’s the link to the Daniel Willingham article on readability scales that I posted after the test.

http://www.danielwillingham.com/daniel-willingham-science-and-education-blog/evaluating-readability-measures

Doug Lemov also mentions some of the unreliability and odd results in Reading Reconsidered.

LikeLike

May 22, 2016 at 8:44 am Reply
1. oldprimarytimer says:
  
  Thanks-that’s really useful. I’ve got ‘Reading Reconsidered’ but haven’t started reading it yet. I’m @claresealy on Twitter btw
  
  LikeLike
  
  May 22, 2016 at 9:07 am Reply
2. oldprimarytimer says:
  
  I’ve updated the post now to add in your link as not everybody reads the comments. Thanks again
  
  LikeLike
  
  May 22, 2016 at 9:25 am Reply
Primary Assessments | @TeacherToolkit says:

[…] There was also no mention of the reading test. There were hints, as the words ‘more challenging‘ and ‘stretching‘ were emphasised. However, none of the tears experienced by year six children last year were mentioned, nor was the huge hike in requirement for vocabulary. (OldPrimaryTimer has done some very interesting work on the readability of the texts here.) […]

LikeLike

October 26, 2016 at 3:05 pm Reply
How to deepen understanding of reading by teaching children how to enjoy it better. – primarytimerydotcom says:

[…] Milling around in bewilderment: that reading comprehension. […]

LikeLike

October 30, 2016 at 10:41 am Reply
Test to the Teach – primarytimerydotcom says:

[…] Ancient Greece if you realised children didn’t know), or, for that matter, what it is like to mill around in bewilderment. The only kind of assessment that will help here is the teacher’s ‘ear to the […]

LikeLike

February 18, 2017 at 3:53 pm Reply
BLOGAGGEDON – mrmorgs says:

[…] Milling around in bewilderment: that reading comprehension. […]

LikeLike

March 17, 2019 at 11:35 am Reply