Assessment and Tests

24th October 2016

Assessment and tests

Since we published our general note on assessment, there have been a number of responses, published and unpublished, supportive and critical. This blog both aims to explain our view on testing in more detail, and answer some of the questions raised by others.

Why we support tests over other forms of assessment [1]

Different forms of assessment are useful in different contexts, but we would argue that a) tests are a particularly good form of assessment; b) that while there are issues with the government’s current arrangements on primary assessment, the existence of tests which are “high stakes” for the school, but not the pupil, are not the main issue.

Well-designed tests (and obviously not all tests are well-designed) have a number of characteristics that make them particularly good for assessing students.

First, they are reliable and valid compared with other forms of assessment. In other words, the results they give are not random (if the same student took the same test under the same conditions, they would get the same result) and they allow you to make inferences about things that are important and which we are interested in – for example, if a pupil can write accurately and originally, or if they are capable of solving maths problems.

This is particularly true when compared with teacher assessments, which is another commonly used form of assessment in primary school, particularly for ‘high stakes’ tests (more on this below).

Teacher assessment, despite the best of intentions, is systematically biased against pupils with SEN, those with challenging behavior, EAL and FSM students, and those with personalities different from the teacher. This is not because teachers are bad humans, but because teachers are humans. The research that won the cognitive scientist Daniel Kahneman his Nobel Prize demonstrated the inherent bias in all humans. What Kahneman found is that even when you are aware of that bias you still can’t act entirely objectively. Tests remove that bias.

Below are some quotes from academic papers on bias in teacher assessment.

Both high and medium weight evidence indicated the following: there is bias in teachers’ assessment (TA) relating to student characteristics, including behaviour (for young children), gender and special educational needs; overall academic achievement and verbal ability may influence judgement when assessing specific skills.(Harlen, 2004)

Studies of the National Curriculum Assessment (NCA) for students aged 6 and 7 in England and Wales in the early 1990s, found considerable error and evidence of bias in relation to different groups of students (Shorrocks et al., 1993; Thomas et al., 1998).(Ibid)

It is argued that pupils are subjected to too many written tests, and that some should be replaced by teacher assessments… The results here suggest that might be severely detrimental to the recorded achievements of children from poor families, and for children from some ethnic minorities. (Burgess and Greaves, 2009)

Teachers tended to perceive low-income children as less able than their higher income peers with equivalent scores on cognitive assessments.(Campbell 2015)

Teacher assessment is also really hard work compared with tests – they massively increase workload, which we know is a huge problem in schools. Teachers have to work so hard anyway, we should make assessment as easy and reliable for them as possible. We would argue that a lot of the problems with the current primary assessment regime are the result of over-reliance on teacher assessment, not over-reliance on tests.

Second tests are, surprisingly, a good way of learning. Robert Bjork’s lab at UCLA has done a lot of research into the “testing effect” on memory – finding that the act of testing (and the recall required) is a good way of committing information to long-term memory.

All of these, though, depend on a well-designed test. The Phonics check is a particularly good test: because it tests the whole ‘domain’ (all the knowledge you should have) and because it is done under thoughtful controlled conditions.

Other tests are less well designed. For example, it is slightly strange that the reading test at Key Stage 2 does not use topics from the National Curriculum, when we know that literacy is very highly dependent on domain specific knowledge (see our knowledge research notefor more information). This would be an obvious way to increase the fairness of that test, particularly for students from disadvantaged backgrounds.

Tests in different circumstances

Many reading this may agree with the points above, but still have issues with the use of ‘high stakes’ test, which usually means tests for accountability purposes. Before we address this, we think it’s important to clarify some of the definitional issues that arise when discussing testing and assessment, because they often cause real confusion in debate. These are briefly summarised below.

Formative and Summative assessment

As most people involved professionally in education know, assessment is often split between:

Formative. Broadly, this is assessment designed to help students and teachers identify strengths and weaknesses, and therefore adapt how they teach the same material.
Summative. This is designed to evaluate whether the student has learned the sum of the material or not.

As with most of these terms, the distinction is slightly less simple than it appears. For example, an end of unit ‘test’ is summative – it tests whether you have learned the material. But if you are spending large chunks of time on revision or intervention – with the class as a whole, groups, or individuals – then it can be used formatively.

Formative and summative assessment is often elided with low and high stakes tests, but they are not the same (see below).

Controlled conditions

Controlled conditions hold (usually interrelated) variables that could affect a test (or, in science, an experiment) constant. There are many ways of doing this.

Sometimes people assume that controlled conditions are the same as a traditional exam (what a parent might remember from their school days, sitting in rows with an invigilator standing at the front of the class). It is therefore argued that the phonics check was not done under controlled conditions.

This isn’t true. The phonics check is held under controlled conditions – it must be done in one window of time, and the child can only attempt it once. It is held in a separate room: the guidance, in fact, on how to control for variables is quite detailed.

This is a perfect example of a test that is not stressful for the pupil, and doesn’t even necessarily seem like a test. It is, despite that, a highly reliable way of testing a student’s mastery of a domain.

Rigorous and tough tests

Our original note made reference to ‘rigour’ and ‘tough’ tests. This is, again, different from a test being summative or high stakes. Formative assessment can be more or less rigorous or tough. Testing can form part of formative as well as summative assessment.

Rigour, in our note, probably unhelpfully elided a number of things. First, assessments must be thoughtfully designed and reliable. Interestingly, while there is sometimes an assumption that tests are impossible to reliably administer to young children, they are in fact often more reliable than teacher assessment. The phonics check, and the CEM baseline assessment, are highly reliable tests.

Second, it refers to the expected level of the test. This is the same as what we mean by ‘tough’. Some have made the leap that ‘tough’ must mean ‘difficult for children’. This is interesting, and worth unpacking. ‘Tough’ reflects our view on what students should know and be able to do: tough is not therefore a comment on the percentage of questions that are beyond the reach of students, but the expectations we should have on those students (and, therefore, how difficult in general the questions should be). With proper preparation and clear information, students do not need to be distressed by this.

High stakes and low stakes

High stakes and low stakes are quite poorly defined terms for an obvious reason – high stakes or low stakes for whom? It usually means one of three things:

High stakes for the pupil. As our original note pointed out (and this was the most important point we were making) there are no high stakes tests for the pupil until students are 16.[2] No parent or child should feel stressed by any assessment in primary school, because the result is irrelevant to their future (except to the extent that the school and teacher change their teaching or behavior as a result.) Qualifications at 16 and 18, on the other hand, matter a lot to a student’s future. They are the only high stakes exams for the pupil.
High stakes for the school. Key Stage 2 tests, in primary school, are high stakes for the school but not the pupil. For example, if students don’t perform well, then the school may be given to new management.
High stakes for the teacher. Tests that are high stakes for the school are usually also high stakes for a teacher. But it is also possible for tests to be high stakes for the teacher because the school chooses to make them high stakes (end of year exams for example).

High stakes tests are the most debated issue in primary school assessment.

Our original note stressed that pupils have no reason to feel unduly stressed by primary assessments, because they have no impact on their futures. We are very worried that in some schools the stress of the school and the teacher (for whom the tests are high stakes) get passed down. Our advisory council member Jonathan Simons has written an excellent piece on this subject.

The stress for students is also in part due to how test preparation is done in some schools. The NUT has said “Teachers report that a curriculum which is organised around the demands of test preparation has demotivated and disengaged children. Instances of emotional and mental health problems have risen”. This is not a result of a test, but how these tests have been prepared for and how they have been presented to children.

Some have argued that the difficulty of questions is what causes students stress. This is an interesting argument, but not one that many of the best performing schools accept: if you prepare students in advance that the test will include questions they cannot answer, they will not be distressed (to give an extreme example, the requirement for a 1^st at university is often 70%. This means that that even the very best performing students in a quantitative subject are expected to get three in ten questions wrong).

Another argument is that high stakes tests (for the school) cause a narrowing of the curriculum.

There are unquestionably ways of ‘teaching to the test’ that are demotivating, stressful, and difficult, and there are real risks with national tests. This includes the narrowing described. That does not mean the test is bad, or that the only way for students to do well is by ‘teaching to the test’ in a narrow and boring way. Great schools do not do this. We’ll address this further in future notes.

Part of our campaign is to encourage schools to adopt the practices of the best. That means getting great results in tests (which are designed to discover whether students can read, or do basic mathematics: these are crucial for their success and later learning) without poor teaching practice and grind.

[1] Anyone interested in a much fuller treatment of these issues should read Daisy Christodolou’s blog Wing to Heaven https://thewingtoheaven.wordpress.com

[2] There is an issue with year 7 resits, and it is sensible that the government has decided to drop these.