Archive for the Bad Statistics Category

Tension in Cosmology?

Posted in Astrohype, Bad Statistics, The Universe and Stuff with tags , , , on October 24, 2013 by telescoper

I noticed this abstract (of a paper by Rest et al.) on the arXiv the other day:

We present griz light curves of 146 spectroscopically confirmed Type Ia Supernovae (0.03<z<0.65) discovered during the first 1.5 years of the Pan-STARRS1 Medium Deep Survey. The Pan-STARRS1 natural photometric system is determined by a combination of on-site measurements of the instrument response function and observations of spectrophotometric standard stars. We have investigated spatial and time variations in the photometry, and we find that the systematic uncertainties in the photometric system are currently 1.2% without accounting for the uncertainty in the HST Calspec definition of the AB system. We discuss our efforts to minimize the systematic uncertainties in the photometry. A Hubble diagram is constructed with a subset of 112 SNe Ia (out of the 146) that pass our light curve quality cuts. The cosmological fit to 313 SNe Ia (112 PS1 SNe Ia + 201 low-z SNe Ia), using only SNe and assuming a constant dark energy equation of state and flatness, yields w = -1.015^{+0.319}_{-0.201}(Stat)+{0.164}_{-0.122}(Sys). When combined with BAO+CMB(Planck)+H0, the analysis yields \Omega_M = 0.277^{+0.010}_{-0.012} and w = -1.186^{+0.076}_{-0.065} including all identified systematics, as spelled out in the companion paper by Scolnic et al. (2013a). The value of w is inconsistent with the cosmological constant value of -1 at the 2.4 sigma level. This tension has been seen in other high-z SN surveys and endures after removing either the BAO or the H0 constraint. If we include WMAP9 CMB constraints instead of those from Planck, we find w = -1.142^{+0.076}_{-0.087}, which diminishes the discord to <2 sigma. We cannot conclude whether the tension with flat CDM is a feature of dark energy, new physics, or a combination of chance and systematic errors. The full Pan-STARRS1 supernova sample will be 3 times as large as this initial sample, which should provide more conclusive results.

The mysterious Pan-STARRS stands for the Panoramic Survey Telescope and Rapid Response System, a set of telescopes cameras and related computing hardware that monitors the sky from its base in Hawaii. One of the many things this system can do is detect and measure distant supernovae, hence the particular application to cosmology described in the paper. The abstract mentions a preliminary measurement of the parameter w, which for those of you who are not experts in cosmology is usually called the “equation of state” parameter for the dark energy component involved in the standard model. What it describes is the relationship between the pressure P and the energy density ρc2 of this mysterious stuff, via the relation P=wρc2. The particularly interesting case is w=-1 which corresponds to a cosmological constant term; see here for a technical discussion. However, we don’t know how to explain this dark energy from first principles so really w is a parameter that describes our ignorance of what is actually going on. In other words, the cosmological constant provides the simplest model of dark energy but even in that case we don’t know where it comes from so it might well be something different; estimating w from surveys can therefore tell us whether we’re on the right track or not.

The abstract explains that, within the errors, the Pan-STARRS data on their own are consistent with w=-1. More interestingly, though, combining the supernovae observations with others, the best-fit value of w shifts towards a value a bit less than -1 (although still with quite a large uncertainty). Incidentally  value of w less than -1 is generally described as a “phantom” dark energy component. I’ve never really understood why…

So far estimates of cosmological parameters from different data sets have broadly agreed with each other, hence the application of the word “concordance” to the standard cosmological model.  However, it does seem to be the case that supernova measurements do generally seem to push cosmological parameter estimates away from the comfort zone established by other types of observation. Could this apparent discordance be signalling that our ideas are wrong?

That’s the line pursued by a Scientific American article on this paper entitled “Leading Dark Energy Theory Incompatible with New Measurement”. This could be true, but I think it’s a bit early to be taking this line when there are still questions to be answered about the photometric accuracy of the Pan-Starrs survey. The headline I would have picked would be more like “New Measurement (Possibly) Incompatible With Other Measurements of Dark Energy”.

But that would have been boring…

Australia: Cyclones go up to Eleven!

Posted in Bad Statistics with tags , , , , , , , on October 14, 2013 by telescoper

I saw a story on the web this morning which points out that Australians can expect 11 cyclones this season.

It’s not a very good headline, because it’s a bit misleading about what the word “expected” means. In fact the number eleven is the average number of cyclones, which is not necessarily the number expected, despite the fact that “expected value” or “expectation value” . If you don’t understand this criticism, ask yourself how many legs you’d expect a randomly-chosen person to have. You’d probably settle on the answer “two”, but that is the most probable number, i.e. the mode, which in this case exceeds the average. If one person in a thousand has only one leg then a group of a thousand has 1999 legs between them, so the average (or arithmetic mean) is 1.999. Most people therefore have more than the average number of legs…

I’ve always found it quite annoying that physicists use the term “expectation value” to mean “average” because it implies that the average is the value you would expect. In the example given above you wouldn’t expect a person to have the average number of legs – if you assume that the actual number is an integer, it’s actually impossible to find a person with 1.999! In other words, the probability of finding someone in that group with the average number of legs in the group is exactly zero.

The same confusion happens when newspapers talk about the “average wage” which is considerably higher than the wage most people receive.

In any case the point is that there is undoubtedly a considerable uncertainty in the prediction of eleven cyclones per season, and one would like to have some idea how large an error bar is associated with that value.

Anyway, statistical pedantry notwithstanding, it is indeed impressive that the number of cyclones in a season goes all the way up to eleven..

Science, Religion and Henry Gee

Posted in Bad Statistics, Books, Talks and Reviews, Science Politics, The Universe and Stuff with tags , , , , , , , , , on September 23, 2013 by telescoper

Last week a piece appeared on the Grauniad website by Henry Gee who is a Senior Editor at the magazine Nature.  I was prepared to get a bit snarky about the article when I saw the title, as it reminded me of an old  rant about science being just a kind of religion by Simon Jenkins that got me quite annoyed a few years ago. Henry Gee’s article, however, is actually rather more coherent than that and  not really deserving of some of the invective being flung at it.

For example, here’s an excerpt that I almost agree with:

One thing that never gets emphasised enough in science, or in schools, or anywhere else, is that no matter how fancy-schmancy your statistical technique, the output is always a probability level (a P-value), the “significance” of which is left for you to judge – based on nothing more concrete or substantive than a feeling, based on the imponderables of personal or shared experience. Statistics, and therefore science, can only advise on probability – they cannot determine The Truth. And Truth, with a capital T, is forever just beyond one’s grasp.

I’ve made the point on this blog many times that, although statistical reasoning lies at the heart of the scientific method, we don’t do anywhere near enough  to teach students how to use probability properly; nor do scientists do enough to explain the uncertainties in their results to decision makers and the general public.  I also agree with the concluding thought, that science isn’t about absolute truths. Unfortunately, Gee undermines his credibility by equating statistical reasoning with p-values which, in my opinion, are a frequentist aberration that contributes greatly to the public misunderstanding of science. Worse, he even gets the wrong statistics wrong…

But the main thing that bothers me about Gee’s article is that he blames scientists for promulgating the myth of “science-as-religion”. I don’t think that’s fair at all. Most scientists I know are perfectly well aware of the limitations of what they do. It’s really the media that want to portray everything in simple black and white terms. Some scientists play along, of course, as I comment upon below, but most of us are not priests but pragmatatists.

Anyway, this episode gives me the excuse to point out  that I ended a book I wrote in 1998 with a discussion of the image of science as a kind of priesthood which it seems apt to repeat here. The book was about the famous eclipse expedition of 1919 that provided some degree of experimental confirmation of Einstein’s general theory of relativity and which I blogged about at some length last year, on its 90th anniversary.

I decided to post the last few paragraphs here to show that I do think there is a valuable point to be made out of the scientist-as-priest idea. It’s to do with the responsibility scientists have to be honest about the limitations of their research and the uncertainties that surround any new discovery. Science has done great things for humanity, but it is fallible. Too many scientists are too certain about things that are far from proven. This can be damaging to science itself, as well as to the public perception of it. Bandwagons proliferate, stifling original ideas and leading to the construction of self-serving cartels. This is a fertile environment for conspiracy theories to flourish.

To my mind the thing  that really separates science from religion is that science is an investigative process, not a collection of truths. Each answer simply opens up more questions.  The public tends to see science as a collection of “facts” rather than a process of investigation. The scientific method has taught us a great deal about the way our Universe works, not through the exercise of blind faith but through the painstaking interplay of theory, experiment and observation.

This is what I wrote in 1998:

Science does not deal with ‘rights’ and ‘wrongs’. It deals instead with descriptions of reality that are either ‘useful’ or ‘not useful’. Newton’s theory of gravity was not shown to be ‘wrong’ by the eclipse expedition. It was merely shown that there were some phenomena it could not describe, and for which a more sophisticated theory was required. But Newton’s theory still yields perfectly reliable predictions in many situations, including, for example, the timing of total solar eclipses. When a theory is shown to be useful in a wide range of situations, it becomes part of our standard model of the world. But this doesn’t make it true, because we will never know whether future experiments may supersede it. It may well be the case that physical situations will be found where general relativity is supplanted by another theory of gravity. Indeed, physicists already know that Einstein’s theory breaks down when matter is so dense that quantum effects become important. Einstein himself realised that this would probably happen to his theory.

Putting together the material for this book, I was struck by the many parallels between the events of 1919 and coverage of similar topics in the newspapers of 1999. One of the hot topics for the media in January 1999, for example, has been the discovery by an international team of astronomers that distant exploding stars called supernovae are much fainter than had been predicted. To cut a long story short, this means that these objects are thought to be much further away than expected. The inference then is that not only is the Universe expanding, but it is doing so at a faster and faster rate as time passes. In other words, the Universe is accelerating. The only way that modern theories can account for this acceleration is to suggest that there is an additional source of energy pervading the very vacuum of space. These observations therefore hold profound implications for fundamental physics.

As always seems to be the case, the press present these observations as bald facts. As an astrophysicist, I know very well that they are far from unchallenged by the astronomical community. Lively debates about these results occur regularly at scientific meetings, and their status is far from established. In fact, only a year or two ago, precisely the same team was arguing for exactly the opposite conclusion based on their earlier data. But the media don’t seem to like representing science the way it actually is, as an arena in which ideas are vigorously debated and each result is presented with caveats and careful analysis of possible error. They prefer instead to portray scientists as priests, laying down the law without equivocation. The more esoteric the theory, the further it is beyond the grasp of the non-specialist, the more exalted is the priest. It is not that the public want to know – they want not to know but to believe.

Things seem to have been the same in 1919. Although the results from Sobral and Principe had then not received independent confirmation from other experiments, just as the new supernova experiments have not, they were still presented to the public at large as being definitive proof of something very profound. That the eclipse measurements later received confirmation is not the point. This kind of reporting can elevate scientists, at least temporarily, to the priesthood, but does nothing to bridge the ever-widening gap between what scientists do and what the public think they do.

As we enter a new Millennium, science continues to expand into areas still further beyond the comprehension of the general public. Particle physicists want to understand the structure of matter on tinier and tinier scales of length and time. Astronomers want to know how stars, galaxies  and life itself came into being. But not only is the theoretical ambition of science getting bigger. Experimental tests of modern particle theories require methods capable of probing objects a tiny fraction of the size of the nucleus of an atom. With devices such as the Hubble Space Telescope, astronomers can gather light that comes from sources so distant that it has taken most of the age of the Universe to reach us from them. But extending these experimental methods still further will require yet more money to be spent. At the same time that science reaches further and further beyond the general public, the more it relies on their taxes.

Many modern scientists themselves play a dangerous game with the truth, pushing their results one-sidedly into the media as part of the cut-throat battle for a share of scarce research funding. There may be short-term rewards, in grants and TV appearances, but in the long run the impact on the relationship between science and society can only be bad. The public responded to Einstein with unqualified admiration, but Big Science later gave the world nuclear weapons. The distorted image of scientist-as-priest is likely to lead only to alienation and further loss of public respect. Science is not a religion, and should not pretend to be one.

PS. You will note that I was voicing doubts about the interpretation of the early results from supernovae  in 1998 that suggested the universe might be accelerating and that dark energy might be the reason for its behaviour. Although more evidence supporting this interpretation has since emerged from WMAP and other sources, I remain sceptical that we cosmologists are on the right track about this. Don’t get me wrong – I think the standard cosmological model is the best working hypothesis we have _ I just think we’re probably missing some important pieces of the puzzle. I don’t apologise for that. I think sceptical is what a scientist should be.

Physics and Statistics

Posted in Bad Statistics, Education with tags , , , on August 16, 2013 by telescoper

Predictably, yesterday’s newspapers and other media  were full of feeble articles about the A-level results, and I don’t just mean the gratuitous pictures of pretty girls opening envelopes and/or jumping in the air.  I’ve never met a journalist who understood the concept of statistical significance, which seems to account for the way they feel able to write whatever they like about any numbers that happen to be newsworthy without feeling constrained by mathematical common-sense.  Sometimes it’s the ridiculous over-interpretation of opinion polls (which usually have a sampling uncertainty of ±3 %), sometimes its league tables. This time it’s the number of students getting the top grades at A-level.

The BBC, for example, made a lot of fuss about the fall in the % of A and A* A-level grades, to  26.3% this year from 26.6% last year. Anyone with a modicum of statistical knowledge would know, however, that whether this drop means anything at all depends on how many results were involved: the sampling uncertainty depends on size N approximately as √N. For a cohort of 300000 this turns into a percentage uncertainty of about 0.57%, which is about twice as large as the reported fall.  The result is therefore “in the noise” – in the sense that there’s no evidence that it was actually harder to get a high grade this year compared with last year – but that didn’t prove a barrier to those editors intent on filling their newspapers and websites with meaningless guff.

Almost hidden among the bilge was an interesting snippet about Physics. It seems that the number of students taking Physics A-level this year has exceeded 35,000 in 2013.  That was set as a government target for 2014, so it has been reached a year early.  The difference between the number that took Physics this year (35,569) and those who took it in 2006 (27,368) is certainly significant. Whether this is the so-called Brian Cox effect or something else, it’s very good news for the future health of the subject.

On the other hand, the proportion of female Physics students remains around 20%. Over the last three years the proportion has been 20.8%, 21.3% and 20.6% so numerically this year is down on last year, but the real message in these figures is that despite strenuous efforts to increase this fraction, there is no significant change.

As I write I’m formally still on Clearing business, sitting beside the telephone in case anyone needs to talk to me. However, at close of play yesterday the School of Mathematical and Physical Sciences had exceeded its recruitment target by quite a healthy margin.  We’re still open for Clearing, though, as our recent expansion means we can take a few more suitably qualified students. Physics and Astronomy did particularly well, and we’re set to welcome our biggest-ever intake into the first year in September 2013. I’m really looking forward to meeting them all.

While I’m on about statistics, here’s another thing. When I was poring over this year’s NSS results, I noticed that only 39 Physics departments appeared in the survey. When I last counted them there were 115 universities in the UK. This number doesn’t include about 50 colleges and other forms of higher education institutions which are also sometimes included in lists of universities. Anyway, my point is that at most about a third of British universities have a physics department.

Now that is a shocking statistic…

Tuesday’s Child

Posted in Bad Statistics, Cute Problems with tags on July 5, 2013 by telescoper

I came across this little teaser this morning and thought I’d share it here.

I have two children, one of whom is a son born on a Tuesday. What is the probability that I have two boys?

Please select an answer from the possibilities listed in the poll below.

This is not a new problem and you can probably find the answer on the internet very quickly, but please try to work it out yourself before doing so. In other words, try thinking before you google! I’ll add a link to a discussion of this puzzle in due course..

UPDATE: Here’s the discussion that triggered this post. As you can see from the poll, most of you got it wrong!

Can We Actually Even Tell if Humans Are Affecting the Climate? What if we did nothing at all?

Posted in Bad Statistics with tags , , on June 26, 2013 by telescoper

Reblog of a post about the doctrine of falsifiablity and its relevance to Climate Change….following on from Monday’s post.

Evidence, Absence, and the Type II Monster

Posted in Bad Statistics with tags , , , , , , on June 24, 2013 by telescoper

I was just having a quick lunchtime shufty at Dave Steele‘s blog. His latest post is inspired by the quotation “Absence of Evidence isn’t Evidence of Absence” which can apparently be traced back to Carl Sagan. I never knew that. Anyway I was muchly enjoying the piece when I suddenly stumbled into this paragraph, which quote without permission because I’m too shy to ask:

In a scientific experiment, the null hypothesis refers to a general or default position that there is no relationship between two measured phenomena. For example a well thought out point in an article by James Delingpole. Rejecting or disproving the null hypothesis is the primary task in any scientific research. If an experiment rejects the null hypothesis, it concludes that there are grounds greater than chance for believing that there is a relationship between the two (or more) phenomena being observed. Again the null hypothesis itself can never be proven. If participants treated with a medication are compared with untreated participants and there is found no statistically significant difference between the two groups, it does not prove that there really is no difference. Or if we say there is a monster in a Loch but cannot find it. The experiment could only be said to show that the results were not sufficient to reject the null hypothesis.

I’m going to pick up the trusty sword of Bayesian probability and have yet another go at the dragon of frequentism, but before doing so I’ll just correct the first sentence. The “null hypothesis” in a frequentist hypothesis test is not necessarily of the form described here: it could be of virtually any form, possibly quite different from the stated one of no correlation between two variables. All that matters is that (a) it has to be well-defined in terms of a model and (b) you have to be content to accept it as true unless and until you find evidence to the contrary. It’s true to say that there’s nowt as well-specified as nowt so nulls are often of the form “there is no correlation” or something like that, but the point is that they don’t have to be.

I note that the wikipedia page on “null hypothesis” uses the same wording as in the first sentence of the quoted paragraph, but this is not what you’ll find in most statistics textbooks. In their compendious three-volume work The Advanced Theory of Statistics Kendall & Stuart even go as far to say that the word “null” is misleading precisely because the hypothesis under test might be quite complicated, e.g. of composite nature.

Anyway, whatever the null hypothesis happens to be, the way a frequentist would proceed would be to calculate what the distribution of measurements would be if it were true. If the actual measurement is deemed to be unlikely (say that it is so high that only 1% of measurements would turn out that big under the null hypothesis) then you reject the null, in this case with a “level of significance” of 1%. If you don’t reject it then you tacitly accept it unless and until another experiment does persuade you to shift your allegiance.

But the significance level merely specifies the probability that you would reject the null-hypothesis if it were correct. This is what you would call a Type I error. It says nothing at all about the probability that the null hypothesis is actually correct. To make that sort of statement you would need to specify an alternative distribution, calculate the distribution based on it, and hence determine the statistical power of the test, i.e. the probability that you would actually reject the null hypothesis when it is correct. To fail to reject the null hypothesis when it’s actually incorrect is to make a Type II error.

If all this stuff about significance, power and Type I and Type II errors seems a bit bizarre, I think that’s because it is. So is the notion, which stems from this frequentist formulation, that all a scientist can ever hope to do is refute their null hypothesis. You’ll find this view echoed in the philosophical approach of Karl Popper and it has heavily influenced the way many scientists see the scientific method, unfortunately.

The asymmetrical way that the null and alternative hypotheses are treated in the frequentist framework is not helpful, in my opinion. Far better to adopt a Bayesian framework in which probability represents the extent to which measurements or other data support a given theory. New statistical evidence can make two hypothesis either more or less probable relative to each other. The focus is not just on rejecting a specific model, but on comparing two or more models in a mutually consistent way. The key notion is not falsifiablity, but testability. Data that fail to reject a hypothesis can properly be interpreted as supporting it, i.e. by making it more probable, but such reasoning can only be done consistently within the Bayesian framework.

What remains true, however, is that the null hypothesis (or indeed any other hypothesis) can never be proven with certainty; that is true whenever probabilistic reasoning is true. Sometimes, though, the weight of supporting evidence is so strong that inductive logic compels us to regard our theory or model or hypothesis as virtually certain. That applies whether the evidence is actual measurement or non-detections; to a Bayesian, absence of evidence can (and indeed often is) evidence of absence. The sun rises every morning and sets every evening; it is silly to argue that this provides us with no grounds for arguing that it will do so tomorrow. Likewise, the sonar surveys and other investigations in Loch Ness provide us with evidence that supports the hypothesis that there isn’t a Monster over virtually every possible hypothetical Monster that has been suggested.

It is perfectly sensible to use this reasoning to infer that there is no Loch Ness Monster. Probably.

Bunn on Bayes

Posted in Bad Statistics with tags , , , , on June 17, 2013 by telescoper

Just a quickie to advertise a nice blog post by Ted Bunn in which he takes down an article in Science by Bradley Efron, which is about frequentist statistics. I’ll leave it to you to read his piece, and the offending article, but couldn’t resist nicking his little graphic that sums up the matter for me:

Untitled-drawing1

The point is that as scientists we are interested in the probability of a model (or hypothesis)  given the evidence (or data) arising from an experiment (or observation). This requires inverse, or inductive, reasoning and it is therefore explicitly Bayesian. Frequentists focus on a different question, about the probability of the data given the model, which is not the same thing at all, and is not what scientists actually need. There are examples in which a frequentist method accidentally gives the correct (i.e. Bayesian) answer, but they are nevertheless still answering the wrong question.

I will make one further comment arising from the following excerpt from the Efron piece.

Bayes’ 1763 paper was an impeccable exercise in probability theory. The trouble and the subsequent busts came from overenthusiastic application of the theorem in the absence of genuine prior information, with Pierre-Simon Laplace as a prime violator.

I think this is completely wrong. There is always prior information, even if it is minimal, but the point is that frequentist methods always ignore it even if it is “genuine” (whatever that means). It’s not always easy to encode this information in a properly defined prior probability of course, but at least a Bayesian will not deliberately answer the wrong question in order to avoid thinking about it.

It is ironic that the pioneers of probability theory, such as Laplace, adopted a Bayesian rather than frequentist interpretation for his probabilities. Frequentism arose during the nineteenth century and held sway until recently. I recall giving a conference talk about Bayesian reasoning only to be heckled by the audience with comments about “new-fangled, trendy Bayesian methods”. Nothing could have been less apt. Probability theory pre-dates the rise of sampling theory and all the frequentist-inspired techniques that modern-day statisticians like to employ and which, in my opinion, have added nothing but confusion to the scientific analysis of statistical data.

IQ in different academic fields – Interesting? Quite!

Posted in Bad Statistics with tags , , , on May 26, 2013 by telescoper

You all know how much I detest league tables, especially those that are based on entirely arbitrary criteria but nevertheless promote a feeling of smug self-satisfaction for those who lucky enough to find themselves at the top. So when my attention was drawn to a blog post that shows (or purports to show) the variation of average IQ across different academic disciplines I decided to post the corresponding ranking with the usual health warning that IQ tests only measure a subject’s ability to do IQ tests. This isn’t even based on IQ test results per se, but on a conversion between the Graduate Record Examination (GRE) results and IQ which may be questionable. Moreover, the differences are really rather small and (as usual) no estimate of sampling uncertainty is provided.

Does this list mean that physicists are smarter than anyone else? You might say that. I couldn’t possibly comment…

  • 130.0 Physics
  • 129.0 Mathematics
  • 128.5 Computer Science
  • 128.0 Economics
  • 127.5 Chemical engineering
  • 127.0 Material science
  • 126.0 Electrical engineering
  • 125.5 Mechanical engineering
  • 125.0 Philosophy
  • 124.0 Chemistry
  • 123.0 Earth sciences
  • 122.0 Industrial engineering
  • 122.0 Civil engineering
  • 121.5 Biology
  • 120.1 English/literature
  • 120.0 Religion/theology
  • 119.8 Political science
  • 119.7 History
  • 118.0 Art history
  • 117.7 Anthropology/archeology
  • 116.5 Architecture
  • 116.0 Business
  • 115.0 Sociology
  • 114.0 Psychology
  • 114.0 Medicine
  • 112.0 Communication
  • 109.0 Education
  • 106.0 Public administration

Never mind the table, look at the sample size!

Posted in Bad Statistics with tags , , , on April 29, 2013 by telescoper

This morning I was just thinking that it’s been a while since I’ve filed anything in the category marked bad statistics when I glanced at today’s copy of the Times Higher and found something that’s given me an excuse to rectify my lapse. Last week saw the publication of said organ’s new Student Experience Survey which ranks  British Universities in order of the responses given by students to questions about various aspects of the teaching, social life and so  on. I had a go at this table a few years ago, but they still keep trotting it out. Here are the main results, sorted in decreasing order:

University Score Resp.
1 University of East Anglia 84.8 119
2 University of Oxford 84.2 259
3 University of Sheffield 83.9 192
3 University of Cambridge 83.9 245
5 Loughborough University 82.8 102
6 University of Bath 82.7 159
7 University of Leeds 82.5 219
8 University of Dundee 82.4 103
9 York St John University 81.2 88
10 Lancaster University 81.1 100
11 University of Southampton 80.9 191
11 University of Birmingham 80.9 198
11 University of Nottingham 80.9 270
14 Cardiff University 80.8 113
14 Newcastle University 80.8 125
16 Durham University 80.3 188
17 University of Warwick 80.2 205
18 University of St Andrews 79.8 109
18 University of Glasgow 79.8 131
20 Queen’s University Belfast 79.2 101
21 University of Hull 79.1 106
22 University of Winchester 79 106
23 Northumbria University 78.9 100
23 University of Lincoln 78.9 103
23 University of Strathclyde 78.9 107
26 University of Surrey 78.8 102
26 University of Leicester 78.8 105
26 University of Exeter 78.8 130
29 University of Chester 78.7 102
30 Heriot-Watt University 78.6 101
31 Keele University 78.5 102
32 University of Kent 78.4 110
33 University of Reading 78.1 101
33 Bangor University 78.1 101
35 University of Huddersfield 78 104
36 University of Central Lancashire 77.9 121
37 Queen Mary, University of London 77.8 103
37 University of York 77.8 106
39 University of Edinburgh 77.7 170
40 University of Manchester 77.4 252
41 Imperial College London 77.3 148
42 Swansea University 77.1 103
43 Sheffield Hallam University 77 102
43 Teesside University 77 103
45 Brunel University 76.6 110
46 University of Portsmouth 76.4 107
47 University of Gloucestershire 76.3 53
47 Robert Gordon University 76.3 103
47 Aberystwyth University 76.3 104
50 University of Essex 76 103
50 University of Glamorgan 76 108
50 Plymouth University 76 112
53 University of Sunderland 75.9 100
54 Canterbury Christ Church University 75.8 102
55 De Montfort University 75.7 103
56 University of Bradford 75.5 52
56 University of Sussex 75.5 102
58 Nottingham Trent University 75.4 103
59 University of Roehampton 75.1 102
60 University of Ulster 75 101
60 Staffordshire University 75 102
62 Royal Veterinary College 74.8 50
62 Liverpool John Moores University 74.8 102
64 University of Bristol 74.7 137
65 University of Worcester 74.4 101
66 University of Derby 74.2 101
67 University College London 74.1 102
68 University of Aberdeen 73.9 105
69 University of the West of England 73.8 101
69 Coventry University 73.8 102
71 University of Hertfordshire 73.7 105
72 London School of Economics 73.5 51
73 Royal Holloway, University of London 73.4 104
74 University of Stirling 73.3 54
75 King’s College London 73.2 105
76 Bournemouth University 73.1 103
77 Southampton Solent University 72.7 102
78 Goldsmiths, University of London 72.5 52
78 Leeds Metropolitan University 72.5 106
80 Manchester Metropolitan University 72.2 104
81 University of Liverpool 72 104
82 Birmingham City University 71.8 101
83 Anglia Ruskin University 71.7 102
84 Glasgow Caledonian University 71.1 100
84 Kingston University 71.1 102
86 Aston University 71 52
86 University of Brighton 71 106
88 University of Wolverhampton 70.9 103
89 Oxford Brookes University 70.5 106
90 University of Salford 70.2 102
91 University of Cumbria 69.2 51
92 Napier University 68.8 101
93 University of Greenwich 68.5 102
94 University of Westminster 68.1 101
95 University of Bedfordshire 67.9 100
96 University of the Arts London 66 54
97 City University London 65.4 102
97 London Metropolitan University 65.4 103
97 The University of the West of Scotland 65.4 103
100 Middlesex University 65.1 104
101 University of East London 61.7 51
102 London South Bank University 61.2 50
Average scores 75.5 11459
YouthSight is the source of the data that have been used to compile the table of results for the Times Higher Education Student Experience Survey, and it retains the ownership of those data. Each higher education institution’s score has been indexed to give a percentage of the maximum score attainable. For each of the 21 attributes, students were given a seven-point scale and asked how strongly they agreed or disagreed with a number of statements based on their university experience.

My current employer, the University of Sussex, comes out right on the average (75.5)  and is consequently in the middle in this league table. However, let’s look at this in a bit more detail.  The number of students whose responses produced the score of 75.5 was just 102. That’s by no means the smallest sample in the survey, either. The University of Sussex has over 13,000 students. The score in this table is therefore obtained from less than 1% of the relevant student population. How representative can the results be, given that the sample is so incredibly small?

What is conspicuous by its absence from this table is any measure of the “margin-of-error” of the estimated score. What I mean by this is how much the sample score would change for Sussex if a different set of 102 students were involved. Unless every Sussex student scores exactly 75.5 then the score will vary from sample to sample. The smaller the sample, the larger the resulting uncertainty.

Given a survey of this type it should be quite straightforward to calculate the spread of scores from student to student within a sample from a given University in terms of the standard deviation, σ, as well as the mean score. Unfortunately, this survey does not include this information. However, lets suppose for the sake of argument that the standard deviation for Cardiff is quite small, say 10% of the mean value, i.e. 7.55. I imagine that it’s much larger than that, in fact, but this is just meant to be by way of an illustration.

If you have a sample size of  N then the standard error of the mean is going to be roughly (σ⁄√N) which, for Sussex, is about 0.75. Assuming everything has a normal distribution, this would mean that the “true” score for the full population of Sussex students has a 95% chance of being within two standard errors of the mean, i.e. between 74 and 77. This means Sussex could really be as high as 43rd place or as low as 67th, and that’s making very conservative assumptions about how much one student differs from another within each institution.

That example is just for illustration, and the figures may well be wrong, but my main gripe is that I don’t understand how these guys can get away with publishing results like this without listing the margin of error at all. Perhaps its because that would make it obvious how unreliable the rankings are? Whatever the reason we’d never get away with publishing results without errors in a serious scientific journal.

This sampling uncertainty almost certainly accounts for the big changes from year to year in these tables. For instance, the University of Lincoln is 23rd in this year’s table, but last year was way down in 66th place. Has something dramatic happened there to account for this meteoric rise? I doubt it. It’s more likely to be just a sampling fluctuation.

In fact I seriously doubt whether any of the scores in this table is significantly different from the mean score; the range from top to bottom is only 61 to 85 showing a considerable uniformity across all 102 institutions listed. What a statistically literate person should take from this table is that (a) it’s a complete waste of time and (b) wherever you go to University you’ll probably have a good experience!