Monday, October 1, 2012

Data from the phonics screen: a worryingly abnormal distribution




The new phonics screening test for children
has been highly controversial.  I’ve been
surprised at the amount of hostility engendered by the idea of testing
children’s knowledge of how letters and sounds go together. There’s plenty of
evidence that this is a foundational skill for reading, and poor ability to do
phonics is a good predictor of later reading problems. So while I can see there
are aspects of the implementation of the phonics screen that could be
improved,  I don’t buy arguments that it
will ‘confuse’ children, or prevent them reading for meaning.




I discovered today that some early data on
the phonics screen had recently been published by the Department for Education,
and my inner nerd was immediately stimulated to visit the website and
download the tables.  What I found was
both surprising and disturbing.




Most of the results are presented in terms
of proportions of children ‘passing’ the screen, i.e. scoring 32 or more. There
are tables showing how this proportion varies with gender, ethnic background,
language background, and provision of free school meals. But I was more
interested in raw scores: after all, a cutoff of 32 is pretty arbitrary. I
wanted to see the range and distribution of scores.  I found just one table showing the relevant
data, subdivided by gender, and I have plotted the results here.




Data from Table 4, Additional Tables 2, SFR21/2012

Department for Education (weblink above)




Those of you who are also statistics nerds
will immediately see something very odd, but other readers may need a bit more
explanation.  When you have a test like
the phonics test, where each item is scored right or wrong, and the number of
correct items is totalled up, you’d normally expect to get a continuous
distribution of scores. That is to say, the numbers of children obtaining a
given score should increase gradually up to some point corresponding to the
most typical score (the mode), and then gradually decline again. If the test is
pretty easy, you may get a ceiling effect, i.e. the mode may be at or close to
the maximum score, so you will see a peak at the right hand side of the plot,
with a long straggly tail of lower scores. 
There may also be a ‘bump’ at the left hand edge of the distribution,
corresponding to those children who can’t read at all – a so-called ‘floor’
effect.  That's evident in the scores for boys. But there's also something else. There’s a sudden upswing in the distribution, just at
the ‘pass’ mark. Okay, you might think, that’s because the clever people at the
DfE have devised the phonics test that way, so that 31 of the items are really
easy, and most children can read them, but then they suddenly get much
harder.  Well, that seems unlikely, and
it would be a rather odd way to develop a test, but it’s not impossible. The
really unbelievable bit is the distribution of scores just above and below the
cutoff. What you can see is that for both boys and girls, fewer children score
31 than 30, in contrast to the general upward trend that was seen for lower
scores. Then there’s a sudden leap , so that about five times as many children
score 32 than 31. But then there’s another dip: fewer children score 33 than
32. Overall, there’s a kind of ‘scalloped’ pattern to the distribution of
scores above 32, which is exactly the kind of distribution you’d expect if a
score of 32 was giving a kind of ‘floor effect’.  But, of course, 32 is not the test floor.





This is so striking, and so abnormal, that
I fear it provides clear-cut evidence that the data have been manipulated, so
that children whose scores would put them just one or two points below the
magic cutoff of 32 have been given the benefit of the doubt, and had their
scores nudged up above cutoff.




This is most unlikely to indicate a problem
inherent in the test itself. It looks like human bias that arises when people
know there is a cutoff and, for whatever reason, are reluctant to have children
score below that cutoff.  As one who is basically in favour of phonics testing, I’m sorry to put another cat among the
educational pigeons, but on the basis of this evidence, I do query whether
these data can be trusted.

No comments:

Post a Comment