789share: evaluation

Thursday, September 12, 2013

Evaluate, evaluate, evaluate

When I was
starting out on a doctorate, I’d look at the senior people in my field and
wonder if I’d ever be like them. It must be great, I thought, to reach the
advanced age of 40. By then you’d have learned everything you needed to know to
do great science, and you could just focus on doing it. I suspect today’s crop
of grad students are a bit more savvy than I was, but all the same, I wonder if
they realise just how wrong that picture is – for two reasons.

First, you never stop learning. The field moves on. Instead of getting easier, it gets harder. I
remember when techniques such as functional brain imaging first came along. The
most competent people in that area were either those who had developed the
methods, or young people who learned them as grad students. If you were of the
generation above, you had three choices: ignore the methods, spend time
learning them, or hire junior people who knew what they were doing. As the
methods evolve, they get ever more complex, and meanwhile, your own brain
starts to shrink. So if you are anticipating making it to a tenured post and
then settling down in your armchair, think again.

Second, the more
senior you get, the more of your time is spent, not on doing your own research,
but on evaluation. You learn that an email entitled ‘invitation’ should not
make your spirits rise: it’s just a desperate attempt to put a positive spin on
a request for you to do more work for no reward. You get regular ‘invitations’ to review
papers and grants, write job references, appraise promotion bids, sit on
interview panels and examine theses. If you are involved in teaching, you’ll
also be engaged in numerous other forms of appraisal.

I was prompted to
think about this when someone asked on an electronic forum what was a
reasonable number of doctoral theses to examine each year. The general consensus was two: though it will
obviously depend on what other commitments someone has. It also varies from
country to country. There are some jolly
places in Europe where a PhD viva is just an excuse for a boozy party with a
lot of dressing up in funny gowns and hats. In UK psychology, the whole thing
is no fun at all: you have to read a document of 50,000-70,000 words reporting
a body of work based on a series of experimental studies. You then write a
report on it and see the candidate for a face-to-face viva, which is typically
2 to 3 hours long. Although failure is uncommon, it is not assumed that the
candidate will pass (unlike in the viva-as-party countries), and weeping or
catatonic candidates are not unheard of. Taking into account travel, etc., if
you are going to do a proper job, you are probably talking about three days’
work. For this you get paid around the minimum wage – the fee for examining is
typically somewhere between £120 and £200.

So why do we do
it? The major reason is because the entire academic enterprise depends on
reciprocity: we want people to examine our students and review our papers and
grants. In addition, it’s important to maintain standards, and to ensure that
degrees, promotions, publications and grants go to those who merit them. But the demands keep growing. In the 37 weeks of this year I’ve been asked
to review 76 papers and six grants. I agreed to review 16 papers and three of
the grants. This, of course, is nothing compared with being a journal editor or
serving on a grants board, something that most of us will do at some point.

Clearly, if I
agreed to do everything I was asked, I’d have no time for anything else. Of course, one learns to say no. But
awareness of these pressures has made me look with rather a critical eye at how
we use evaluation. There is, for instance, research suggesting that job interviews aren’t very useful at identifying good candidates: we tend to be seduced by immediate
impressions, which may not be a good indicator of a person’s suitability. Like
most people, I’d be reluctant to take on an employee I hadn’t interviewed, but
if Daniel Kahneman is to be believed, this is just because I am a victim of the
Illusion of Validity.

I’m a supporter of the peer review system used by
journals, and here I feel I’m on more
solid ground, because I can point to instances where my papers have been
improved by input from reviewers. Nevertheless, where reviewing is used simply
to reject/accept papers or grant proposals,
and where fine-grained decisions have to be made between many
high-quality submissions, agreement between experts may be little better than chance
(e.g. Fogelholm et al, 2012). Nevertheless, we stick with it, because it’s hard
to know what to put in its place.

I’ve written a fair bit about that expensive and time-consuming evaluation process that UK academics
engage in, the REF. It requires experts to make
judgements of whether, for instance, papers are of 3* or 4* quality, a
distinction based on whether the research is “world leading” or “internationally
excellent…. but falls short of the highest standards of excellence.” The reliability of such judgements has not, to my knowledge, been evaluated, yet large amounts of funding depend on them. Those on REF committees are in the same situation as Pavlov’s poor dogs, having
to make distinctions that are on the one hand impossible (discriminating
circles and ellipses that become increasingly similar) and on the other hand
very important (get it wrong and you get a shock).

There is one good
thing about doing so much evaluation. You have the opportunity to see what
others are doing – you may be the first person to read an important new paper,
or examine a ground-breaking thesis. You may be forced to engage with different
ways of thinking, and confronted with new topics and ideas. You may be able to provide useful input to authors. And since you
yourself will be evaluated, it can be useful to see life from the other side of
the table, as the person doing the evaluating. But all too often, even these
advantages fail to compensate for the fact that as a senior academic you will
spend more and more time on evaluation of others and less and less doing your
own research.

Reference

Fogelholm, Mikael, Leppinen, Saara, Auvinen, Anssi, Raitanen, Jani, Nuutinen, Anu, & Väänänen, Kalervo (2012). Panel discussion does not improve reliability of peer review for medical research grant proposals Journal of Clinical Epidemiology, 65 (1), 47-52 DOI: 10.1016/j.jclinepi.2011.05.001

Saturday, January 19, 2013

Journal Impact Factors and REF 2014

In 2014, British institutions of Higher Education are to be evaluated in the Research Excellence Framework (REF), an important exercise on which their future funding depends. Academics are currently undergoing scrutiny by their institutions to determine whether their research outputs are good enough to be entered in the REF. Outputs are to be assessed in terms of "‘originality, significance and rigour’, with reference to international research quality standards."

Here's what the REF2014 guidelines say about journal impact factors:

"No sub-panel will make any use of journal impact factors, rankings, lists or the perceived standing of publishers in assessing the quality of research outputs."

Here are a few sources that explain why it is a bad idea to use impact factors to evaluate individual research outputs:

Stephen Curry's blog

David Colquhoun letter to Nature

Manuscript by Brembs & Munafo on "Unintended consequences of journal rank"

Editage tutorial

Here is some evidence that the REF2014 statement on impact factors is being widely ignored:

Jenny Rohn Guardian blogpost

And here's a letter I wrote yesterday to the representatives of RCUK who act as observers on REF panels about this. I'll let you know if I get a reply.

18th January 2013

To: Ms Anne-Marie Coriat: Medical Research Council
Dr Alf Game: Biotechnology and Biological Sciences Research Council
Dr Alison Wall: Engineering and Physical Sciences Research Council
Ms Michelle Wickendon: Natural Environment Research Council
Ms Victoria Wright: Science and Technology Facilities Council
Dr Fiona Armstrong: The Economic and Social Research Council
Mr Gary Grubb: Arts and Humanities Research Council

Dear REF2014 Observers,

I am contacting you because a growing number of academics are expressing concerns that, contrary to what is stated in the REF guidelines, journal impact factors are being used by some Universities to rate research outputs. Jennifer Rohn raised this issue here in a piece on the Guardian website last November:
http://www.guardian.co.uk/science/occams-corner/2012/nov/30/1

I have not been able to find any official route whereby such concerns can be raised, and I have evidence that some of those involved in the REF, including senior university figures and REF panel members, regard it as inevitable and appropriate that journal impact factors will be factored in to ratings - albeit as just one factor among others. Many, perhaps most, of the academics involved in panels and REF preparations grew up in a climate where publication in a high impact journal was regarded as the acme of achievement. Insofar as there are problems with the use of impact factors, they seem to think the only difficulty is the lack of comparability across sub-disciplines, which can be adjusted for. Indeed, I have been told that it is naïve to imagine that this statement should be taken literally: "No sub-panel will make any use of journal impact factors, rankings, lists or the perceived standing of publishers in assessing the quality of research outputs."

Institutions seem to vary in how strictly they are interpreting this statement and this could lead to serious problems further down the line. An institution that played by the rules and submitted papers based only on perceived scientific quality might challenge the REF outcome if they found the panel had been basing ratings on journal impact factor. The evidence for such behaviour could be reconstructed from an analysis of outputs submitted for the REF.

I think it is vital that RCUK responds to the concerns raised by Dr Rohn to clarify the position on journal impact factors and explain the reasoning behind the guidelines on this. Although the statement seems unambiguous, there is a widespread view that the intention is only to avoid slavish use of impact factors as a sole criterion, not to ban their use altogether. If that is the case, then this needs to be made explicit. If not, then it would be helpful to have some mechanism whereby academics could report institutions that flout this rule.

Yours sincerely

(Professor) Dorothy Bishop

Reference

Colquhoun, D. (2003). Challenging the tyranny of impact factors Nature, 423 (6939), 479-479 DOI: 10.1038/423479a

P.S. 21/1/13

This post has provoked some excellent debate in the Comments, and also on Twitter. I have collated the tweets on Storify here, and the Comments are below. They confirm that there are very divergent views out there about whether REF panels are likely to, or should, use journal impact factor in any shape or form. They also indicate that this issue is engendering high levels of anxiety in many sections of academia.

P.P.S. 30/1/13

REPLY FROM HEFCE

I now have a response from Graeme Rosenberg, REF Manager at HEFCE, who kindly agreed that I could post relevant content from his email here. This briefly explains why impact factors are disallowed for REF panels, but notes that institutions are free to flout this rule in their submissions, at their own risk. The text follows:

I think your letter raises two sets of issues, which I will respond to in turn.

The REF panel criteria state clearly that panels will not use journal impact factors in the assessment. These criteria were developed by the panels themselves and we have no reason to doubt they will be applied correctly. The four main panels will oversee the work of the sub-panels throughout the assessment process, and it part of the main panels' remit to ensure that all sub-panels apply the published criteria. If there happen to be some individual panel members at this stage who are unsure about the potential use of impact factors in the panels' assessments, the issue will be clarified by the panel chairs when the assessment starts. The published criteria are very clear and do not leave any room for ambiguity on this point.

The question of institutions using journal impact factors in preparing their submissions is a separate issue. We have stated clearly what the panels will and will not be using to inform their judgements. But institutions are autonomous and ultimately it is their decision as to what forms of evidence they use to inform their selection decisions. If they choose to use journal impact factors as part of the evidence, then the evidence for their decisions will differ to that used by panels. This would no doubt increase the risk to the institution of reaching different conclusions to the REF panels. Institutions would also do well to consider why the REF panels will not use journal impact factors - at the level of individual outputs they are a poor proxy for quality. Nevertheless, it remains the institution's choice.