Archive for the Bad Statistics Category

Cauchy Statistics

Posted in Bad Statistics, The Universe and Stuff with tags , , , , on June 7, 2010 by telescoper

I was attempting to restore some sort of order to my office today when I stumbled across some old jottings about the Cauchy distribution, which is perhaps more familiar to astronomers as the Lorentz distribution. I never used in the publication they related to so I thought I’d just quickly pop the main idea on here in the hope that some amongst you might find it interesting and/or amusing.

What sparked this off is that the simplest cosmological models (including the particular one we now call the standard model) assume that the primordial density fluctuations we see imprinted in the pattern of temperature fluctuations in the cosmic microwave background and which we think gave rise to the large-scale structure of the Universe through the action of gravitational instability, were distributed according to Gaussian statistics (as predicted by the simplest versions of the inflationary universe theory).  Departures from Gaussianity would therefore, if found, yield important clues about physics beyond the standard model.

Cosmology isn’t the only place where Gaussian (normal) statistics apply. In fact they arise  generically,  in circumstances where variation results from the linear superposition of independent influences, by virtue of the Central Limit Theorem. Noise in experimental detectors is often treated as following Gaussian statistics, for example.

The Gaussian distribution has some nice properties that make it possible to place meaningful bounds on the statistical accuracy of measurements made in the presence of Gaussian fluctuations. For example, we all know that the margin of error of the determination of the mean value of a quantity from a sample of size n independent Gaussian-dsitributed varies as 1/\sqrt{n}; the larger the sample, the more accurately the global mean can be known. In the cosmological context this is basically why mapping a larger volume of space can lead, for instance, to a more accurate determination of the overall mean density of matter in the Universe.

However, although the Gaussian assumption often applies it doesn’t always apply, so if we want to think about non-Gaussian effects we have to think also about how well we can do statistical inference if we don’t have Gaussianity to rely on.

That’s why I was playing around with the peculiarities of the Cauchy distribution. This comes up in a variety of real physics problems so it isn’t an artificially pathological case. Imagine you have two independent variables X and Y each of which has a Gaussian distribution with zero mean and unit variance. The ratio Z=X/Y has a probability density function of the form

p(z)=1/\pi(1+z^2),

which is a form of the Cauchy distribution. There’s nothing at all wrong with this as a distribution – it’s not singular anywhere and integrates to unity as a pdf should. However, it does have a peculiar property that none of its moments is finite, not even the mean value!

Following on from this property is the fact that Cauchy-distributed quantities violate the Central Limit Theorem. If we take n independent Gaussian variables then the distribution of sum X_1+X_2 + \ldots X_n has the normal form, but this is also true (for large enough n) for the sum of n independent variables having any distribution as long as it has finite variance.

The Cauchy distribution has infinite variance so the distribution of the sum of independent Cauchy-distributed quantities Z_1+Z_2 + \ldots Z_n doesn’t tend to a Gaussian. In fact the distribution of the sum of any number of  independent Cauchy variates is itself a Cauchy distribution. Moreover the distribution of the mean of a sample of size n does not depend on n for Cauchy variates. This means that making a larger sample doesn’t reduce the margin of error on the mean value!

This was essentially the point I made in a previous post about the dangers of using standard statistical techniques – which usually involve the Gaussian assumption – to distributions of quantities formed as ratios.

We cosmologists should be grateful that we don’t seem to live in a Universe whose fluctuations are governed by Cauchy, rather than (nearly) Gaussian, statistics. Measuring more of the Universe wouldn’t be any use in determining its global properties as we’d always be dominated by cosmic variance..

 

Clustering in the Deep

Posted in Bad Statistics, The Universe and Stuff with tags , , , , , , on May 27, 2010 by telescoper

I couldn’t resist a quick lunchtime post about the results that have come out concerning the clustering of galaxies found by the HerMES collaboration using the Herschel Telescope. There’s quite a lengthy press release accompanying the new results, and there’s not much point in repeating the details here, so I’ll just show a wonderful image showing thousands of galaxies and their far-infrared colours.

Image Credit: European Space Agency, SPIRE and HERMES consortia

According to the press release, this looks “like grains of sand”. I wonder if whoever wrote the text was deliberately referring to Genesis 22:17?

.. they shall multiply as the stars of the heaven, and as the grains of sand upon the sea shore.

However, let me take issue a little with the following excerpt from said press release:

While at a first glance the galaxies look to be scattered randomly over the image, in fact they are not. A closer look will reveals that there are regions which have more galaxies in, and regions that have fewer.

A while ago I posted an item asking what “scattered randomly” is meant to mean. It included this picture

This is what a randomly-scattered set of points actually looks like. You’ll see that it also has some regions with more galaxies in them than others. Coincidentally, I showed the same  picture again this morning in one of my postgraduate lectures on statistics and a majority of the class – as I’m sure do many of you seeing it for the first time –  thought it showed a clustered pattern. Whatever “randomness” means precisely, the word certainly implies some sort of variation whereas the press release implies the opposite. I think a little re-wording might be in order.

What galaxy clustering statistics reveal is that the variation in density from place-to-place is greater than that expected in a random distribution like that shown. This has been known since the 1960s, so it’s not  the result that these sources are clustered that’s so important. In fact, The preliminary clustering results from the HerMES surveys – described in a little more detail in a short paper available on the arXIv – are especially  interesting because they show that some of the galaxies seen in this deep field are extremely bright (in the far-infrared), extremely distant, high-redshift objects which exhibit strong spatial correlations. The statistical form of this clustering provides very useful input for theorists trying to model the processes of galaxy formation and evolution.In particular, the brightest objects at high redshift have a propensity to appear preferentially in dense concentrations, making them even more strongly clustered than rank-and-file galaxies. This fact probably contains important information about the environmental factors responsible for driving their enormous luminosities.

The results are still preliminary, but we’re starting to see concrete evidence of the impact Herschel is going to have on extragalactic astrophysics.

General Purpose Election Blog Post

Posted in Bad Statistics, Politics with tags , , on April 14, 2010 by telescoper

A dramatic new <insert name of polling organization, e.g. GALLUP> opinion poll has revealed that the <insert name of political party> lead over <insert name of political party> has WIDENED/SHRUNK/NOT CHANGED dramatically. This almost certainly means a <insert name of political party> victory or a hung parliament. This contrasts with a recent <insert name of polling organization, e.g. YOUGOV> poll which showed that the <insert name of political party> lead had WIDENED/SHRUNK/NOT CHANGED which almost certainly meant a <insert name of political party> victory or a hung parliament.

Political observers were quick to point out that we shouldn’t read too much into this poll, as tomorrow’s <insert name of polling organization> poll shows the <insert name of political party> lead over <insert name of political party> has WIDENED/SHRUNK/NOT CHANGED dramatically, almost certainly meaning a <insert name of political party> victory or a hung parliament.

(adapted, without permission, from Private Eye)

Science’s Dirtiest Secret?

Posted in Bad Statistics, The Universe and Stuff with tags , , , on March 19, 2010 by telescoper

My attention was drawn yesterday to an article, in a journal I never read called American Scientist, about the role of statistics in science. Since this is a theme I’ve blogged about before I had a quick look at the piece and quickly came to the conclusion that the article was excruciating drivel. However, looking at it again today, my opinion of it has changed. I still don’t think it’s very good, but it didn’t make me as cross second time around. I don’t know whether this is because I was in a particularly bad mood yesterday, or whether the piece has been edited. But although it didn’t make me want to scream, I still think it’s a poor article.

Let me start with the opening couple of paragraphs

For better or for worse, science has long been married to mathematics. Generally it has been for the better. Especially since the days of Galileo and Newton, math has nurtured science. Rigorous mathematical methods have secured science’s fidelity to fact and conferred a timeless reliability to its findings.

During the past century, though, a mutant form of math has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos. Supposedly, the proper use of statistics makes relying on scientific results a safe bet. But in practice, widespread misuse of statistical methods makes science more like a crapshoot.

In terms of historical accuracy, the author, Tom Siegfried, gets off to a very bad start. Science didn’t get “seduced” by statistics.  As I’ve already blogged about, scientists of the calibre of Gauss and Laplace – and even Galileo – were instrumental in inventing statistics.

And what were the “modes of calculation that had served it so faithfully” anyway? Scientists have long  recognized the need to understand the behaviour of experimental errors, and to incorporate the corresponding uncertainty in their analysis. Statistics isn’t a “mutant form of math”, it’s an integral part of the scientific method. It’s a perfectly sound discipline, provided you know what you’re doing…

And that’s where, despite the sloppiness of his argument,  I do have some sympathy with some of what  Siegfried says. What has happened, in my view, is that too many people use statistical methods “off the shelf” without thinking about what they’re doing. The result is that the bad use of statistics is widespread. This is particularly true in disciplines that don’t have a well developed mathematical culture, such as some elements of biosciences and medicine, although the physical sciences have their own share of horrors too.

I’ve had a run-in myself with the authors of a paper in neurobiology who based extravagant claims on an inappropriate statistical analysis.

What is wrong is therefore not the use of statistics per se, but the fact that too few people understand – or probably even think about – what they’re trying to do (other than publish papers).

It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.

Quite, but what does this mean for “science’s dirtiest secret”? Not that it involves statistical reasoning, but that large numbers of scientists haven’t a clue what they’re doing when they do a statistical test. And if this is the case with practising scientists, how can we possibly expect the general public to make sense of what is being said by the experts? No wonder people distrust scientists when so many results confidently announced on the basis of totally spurious arguments, turn out to be be wrong.

The problem is that the “standard” statistical methods shouldn’t be “standard”. It’s true that there are many methods that work in a wide range of situations, but simply assuming they will work in any particular one without thinking about it very carefully is a very dangerous strategy. Siegfried discusses examples where the use of “p-values” leads to incorrect results. It doesn’t surprise me that such examples can be found, as the misinterpretation of p-values is rife even in numerate disciplines, and matters get worse for those practitioners who combine p-values from different studies using meta-analysis, a method which has no mathematical motivation whatsoever and which should be banned. So indeed should a whole host of other frequentist methods which offer limitless opportunities for to make a complete botch of the data arising from a research project.

Siegfried goes on

Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical.

Any single scientific study done along is quite likely to be incorrect. Really? Well, yes, if it is done incorrectly. But the point is not that they are incorrect because they use statistics, but that they are incorrect because they are done incorrectly. Many scientists don’t even understand the statistics well enough to realise that what they’re doing is wrong.

If I had my way, scientific publications – especially in disciplines that impact directly on everyday life, such as medicine – should adopt a much more rigorous policy on statistical analysis and on the way statistical significance is reported. I favour the setting up of independent panels whose responsibility is to do the statistical data analysis on behalf of those scientists who can’t be trusted to do it correctly themselves.

Having started badly, and lost its way in the middle, the article ends disappointingly too. Having led us through a wilderness of failed frequentists analyses, he finally arrives at a discussion of the superior Bayesian methodology, in irritatingly half-hearted fashion.

But Bayesian methods introduce a confusion into the actual meaning of the mathematical concept of “probability” in the real world. Standard or “frequentist” statistics treat probabilities as objective realities; Bayesians treat probabilities as “degrees of belief” based in part on a personal assessment or subjective decision about what to include in the calculation. That’s a tough placebo to swallow for scientists wedded to the “objective” ideal of standard statistics….

Conflict between frequentists and Bayesians has been ongoing for two centuries. So science’s marriage to mathematics seems to entail some irreconcilable differences. Whether the future holds a fruitful reconciliation or an ugly separation may depend on forging a shared understanding of probability.

The difficulty with this piece as a whole is that it reads as an anti-science polemic: “Some science results are based on bad statistics, therefore statistics is bad and science that uses statistics is bogus.” I don’t know whether that’s what the author intended, or whether it was just badly written.

I’d say the true state of affairs is different. A lot of bad science is published, and a lot of that science is bad because it uses statistical reasoning badly. You wouldn’t however argue that a screwdriver is no use because some idiot tries to hammer a nail in with one.

Only a bad craftsman blames his tools.

The Seven Year Itch

Posted in Bad Statistics, Cosmic Anomalies, The Universe and Stuff with tags , , , on January 27, 2010 by telescoper

I was just thinking last night that it’s been a while since I posted anything in the file marked cosmic anomalies, and this morning I woke up to find a blizzard of papers on the arXiv from the Wilkinson Microwave Anisotropy Probe (WMAP) team. These relate to an analysis of the latest data accumulated now over seven years of operation; a full list of the papers is given here.

I haven’t had time to read all of them yet, but I thought it was worth drawing attention to the particular one that relates to the issue of cosmic anomalies. I’ve taken the liberty of including the abstract here:

A simple six-parameter LCDM model provides a successful fit to WMAP data, both when the data are analyzed alone and in combination with other cosmological data. Even so, it is appropriate to search for any hints of deviations from the now standard model of cosmology, which includes inflation, dark energy, dark matter, baryons, and neutrinos. The cosmological community has subjected the WMAP data to extensive and varied analyses. While there is widespread agreement as to the overall success of the six-parameter LCDM model, various “anomalies” have been reported relative to that model. In this paper we examine potential anomalies and present analyses and assessments of their significance. In most cases we find that claimed anomalies depend on posterior selection of some aspect or subset of the data. Compared with sky simulations based on the best fit model, one can select for low probability features of the WMAP data. Low probability features are expected, but it is not usually straightforward to determine whether any particular low probability feature is the result of the a posteriori selection or of non-standard cosmology. We examine in detail the properties of the power spectrum with respect to the LCDM model. We examine several potential or previously claimed anomalies in the sky maps and power spectra, including cold spots, low quadrupole power, quadropole-octupole alignment, hemispherical or dipole power asymmetry, and quadrupole power asymmetry. We conclude that there is no compelling evidence for deviations from the LCDM model, which is generally an acceptable statistical fit to WMAP and other cosmological data.

Since I’m one of those annoying people who have been sniffing around the WMAP data for signs of departures from the standard model, I thought I’d comment on this issue.

As the abstract says, the  LCDM model does indeed provide a good fit to the data, and the fact that it does so with only 6 free parameters is particularly impressive. On the other hand, this modelling process involves the compression of an enormous amount of data into just six numbers. If we always filter everything through the standard model analysis pipeline then it is possible that some vital information about departures from this framework might be lost. My point has always been that every now and again it is worth looking in the wastebasket to see if there’s any evidence that something interesting might have been discarded.

Various potential anomalies – mentioned in the above abstract – have been identified in this way, but usually there has turned out to be less to them than meets the eye. There are two reasons not to get too carried away.

The first reason is that no experiment – not even one as brilliant as WMAP – is entirely free from systematic artefacts. Before we get too excited and start abandoning our standard model for more exotic cosmologies, we need to be absolutely sure that we’re not just seeing residual foregrounds, instrument errors, beam asymmetries or some other effect that isn’t anything to do with cosmology. Because it has performed so well, WMAP has been able to do much more science than was originally envisaged, but every experiment is ultimately limited by its own systematics and WMAP is no different. There is some (circumstantial) evidence that some of the reported anomalies may be at least partly accounted for by  glitches of this sort.

The second point relates to basic statistical theory. Generally speaking, an anomaly A (some property of the data) is flagged as such because it is deemed to be improbable given a model M (in this case the LCDM). In other words the conditional probability P(A|M) is a small number. As I’ve repeatedly ranted about in my bad statistics posts, this does not necessarily mean that P(M|A)- the probability of the model being right – is small. If you look at 1000 different properties of the data, you have a good chance of finding something that happens with a probability of 1 in a thousand. This is what the abstract means by a posteriori reasoning: it’s not the same as talking out of your posterior, but is sometimes close to it.

In order to decide how seriously to take an anomaly, you need to work out P(M|A), the probability of the model given the anomaly, which requires that  you not only take into account all the other properties of the data that are explained by the model (i.e. those that aren’t anomalous), but also specify an alternative model that explains the anomaly better than the standard model. If you do this, without introducing too many free parameters, then this may be taken as compelling evidence for an alternative model. No such model exists -at least for the time being – so the message of the paper is rightly skeptical.

So, to summarize, I think what the WMAP team say is basically sensible, although I maintain that rummaging around in the trash is a good thing to do. Models are there to be tested and surely the best way to test them is to focus on things that look odd rather than simply congratulating oneself about the things that fit? It is extremely impressive that such intense scrutiny over the last seven years has revealed so few oddities, but that just means that we should look even harder..

Before too long, data from Planck will provide an even sterner test of the standard framework. We really do need an independent experiment to see whether there is something out there that WMAP might have missed. But we’ll have to wait a few years for that.

So far it’s WMAP 7 Planck 0, but there’s plenty of time for an upset. Unless they close us all down.

The League of Small Samples

Posted in Bad Statistics with tags , , , on January 14, 2010 by telescoper

This morning I was just thinking that it’s been a while since I’ve filed anything in the category marked bad statistics when I glanced at today’s copy of the Times Higher and found something that’s given me an excuse to rectify my lapse. Today saw the publication of said organ’s new Student Experience Survey which ranks  British Universities in order of the responses given by students to questions about various aspects of the teaching, social life and so  on. Here are the main results, sorted in decreasing order:

1 Loughborough University 84.9 128
2 University of Cambridge, The 82.6 259
3 University of Oxford, The 82.6 197
4 University of Sheffield, The 82.3 196
5 University of East Anglia, The 82.1 122
6 University of Wales, Aberystwyth 82.1 97
7 University of Leeds, The 81.9 185
8 University of Dundee, The 80.8 75
9 University of Southampton, The 80.6 164
10 University of Glasgow, The 80.6 136
11 University of Exeter, The 80.3 160
12 University of Durham 80.3 189
13 University of Leicester, The 79.9 151
14 University of St Andrews, The 79.9 104
15 University of Essex, The 79.5 65
16 University of Warwick, The 79.5 190
17 Cardiff University 79.4 180
18 University of Central Lancashire, The 79.3 88
19 University of Nottingham, The 79.2 233
20 University of Newcastle-upon-Tyne, The 78.9 145
21 University of Bath, The 78.7 142
22 University of Wales, Bangor 78.7 43
23 University of Edinburgh, The 78.1 190
24 University of Birmingham, The 78.0 179
25 University of Surrey, The 77.8 100
26 University of Sussex, The 77.6 49
27 University of Lancaster, The 77.6 123
28 University of Stirling, The 77.6 44
29 University of Wales, Swansea 77.5 61
30 University of Kent at Canterbury, The 77.3 116
30 University of Teesside, The 77.3 127
32 University of Hull, The 77.2 87
33 Robert Gordon University, The 77.2 57
34 University of Lincoln, The 77.0 121
35 Nottingham Trent University, The 76.9 192
36 University College Falmouth 76.8 40
37 University of Gloucestershire 76.8 74
38 University of Liverpool, The 76.7 89
39 University of Keele, The 76.5 57
40 University of Northumbria at Newcastle, The 76.4 149
41 University of Plymouth, The 76.3 190
41 University of Reading, The 76.3 117
43 Queen’s University of Belfast, The 76.0 149
44 University of Aberdeen, The 75.9 84
45 University of Strathclyde, The 75.7 72
46 Staffordshire University 75.6 85
47 University of York, The 75.6 121
48 St George’s Medical School 75.4 33
49 Southampton Solent University 75.2 34
50 University of Portsmouth, The 75.2 141
51 Queen Mary, University of London 75.2 104
52 University of Manchester 75.1 221
53 Aston University 75.0 66
54 University of Derby 75.0 33
55 University College London 74.8 114
56 Sheffield Hallam University 74.8 159
57 Glasgow Caledonian University 74.6 72
58 King’s College London 74.6 101
59 Brunel University 74.4 64
60 Heriot-Watt University 74.1 35
61 Imperial College of Science, Technology & Medicine 73.9 111
62 De Montfort University 73.6 83
63 Bath Spa University 73.4 64
64 Bournemouth University 73.3 128
65 University of the West of England, Bristol 73.3 207
66 Leeds Metropolitan University 73.1 143
67 University of Chester 72.5 61
68 University of Bristol, The 72.3 145
69 Royal Holloway, University of London 72.1 59
70 Canterbury Christ Church University 71.8 78
71 University of Huddersfield, The 71.8 97
72 York St John University College 71.8 31
72 University of Wales Institute, Cardiff 71.8 41
74 University of Glamorgan 71.6 84
75 University of Salford, The 71.2 58
76 Roehampton University 71.1 47
77 Manchester Metropolitan University, The 71.1 131
78 University of Northampton 70.8 42
79 University of Sunderland, The 70.8 61
80 Kingston University 70.7 121
81 University of Bradford, The 70.6 33
82 Oxford Brookes University 70.5 99
83 University of Ulster 70.3 61
84 Coventry University 69.9 82
85 University of Brighton, The 69.4 106
86 University of Hertfordshire 68.9 138
87 University of Bedfordshire 68.6 44
88 Queen Margaret University, Edinburgh 68.5 35
89 London School of Economics and Political Science 68.4 73
90 Royal Veterinary College, The 68.2 43
91 Anglia Ruskin University 68.1 71
92 Birmingham City University 67.7 109
93 University of Wolverhampton, The 67.5 72
94 Liverpool John Moores University 67.2 103
95 Goldsmiths College 66.9 42
96 Napier University 65.5 63
97 London South Bank University 64.9 44
98 City University 64.6 44
99 University of Greenwich, The 63.9 67
100 University of the Arts London 62.8 40
101 Middlesex University 61.4 51
102 University of Westminster, The 60.4 76
103 London Metropolitan University 55.2 37
104 University of East London, The 54.2 41
10465

The maximum overall score is 100 and the figure in the rightmost column is the number of students from that particular University that contributed to the survey. The total number of students involved is shown at the bottom, i.e. 10465.

My current employer, Cardiff University, comes out pretty well (17th) in this league table, but some do surprisingly poorly such as Imperial which is 61st. No doubt University spin doctors around the country will be working themselves into a frenzy trying how best to present their showing in the list, but before they get too carried away I want to dampen their enthusiasm.

Let’s take Cardiff as an example. The number of students whose responses produced the score of 79.4 was just 180. That’s by no means the smallest sample in the survey, either. Cardiff University has approximately 20,000 undergraduates. The score in this table is therefore obtained from less than 1% of the relevant student population. How representative can the results be, given that the sample is so incredibly small?

What is conspicuous by its absence from this table is any measure of the “margin-of-error” of the estimated score. What I mean by this is how much the sample score would change for Cardiff if a different set of 180 students were involved. Unless every Cardiff student gives Cardiff exactly 79.4 then the score will vary from sample to sample. The smaller the sample, the larger the resulting uncertainty.

Given a survey of this type it should be quite straightforward to calculate the spread of scores from student to student within a sample from a given University in terms of the standard deviation, σ, as well as the mean score. Unfortunately, this survey does not include this information. However, lets suppose for the sake of argument that the standard deviation for Cardiff is quite small, say 10% of the mean value, i.e. 7.94. I imagine that it’s much larger than that, in fact, but this is just meant to be by way of an illustration.

If you have a sample size of  N then the standard error of the mean is going to be roughly (σ⁄√N) which, for Cardiff, is about 0.6. Assuming everything has a normal distribution, this would mean that the “true” score for the full population of Cardiff students has a 95% chance of being within two standard errors of the mean, i.e. between 78.2 and 80.6. This means Cardiff could really be as high as 9th place or as low as 23rd, and that’s making very conservative assumptions about how much one student differs from another within each institution.

That example is just for illustration, and the figures may well be wrong, but my main gripe is that I don’t understand how these guys can get away with publishing results like this without listing the margin of error at all. Perhaps its because that would make it obvious how unreliable the rankings are? Whatever the reason we’d never get away with publishing results without errors in a serious scientific journal.

Still, at least there’s been one improvement since last year: the 2009 results gave every score to two decimal places! My A-level physics teacher would have torn strips off me if I’d done that!

Precision, you see, is not the same as accuracy….

Dark Squib

Posted in Bad Statistics, Science Politics, The Universe and Stuff with tags , on December 19, 2009 by telescoper

After today’s lengthy pre-Christmas traipse around Cardiff in the freezing cold, I don’t think I can summon up the energy for a lengthy post today. However, today’s cryogenic temperatures did manage to remind me that I hadn’t closed the book on a previous story about rumours of a laboratory detection of dark matter by the experiment known as CDMS. The main rumour – that there was going to be a paper in Nature reporting the definite detection of dark matter particles – turned out to be false, but there was a bit of truth after all, in that they did put out a paper yesterday (18th December, the date that the original rumour suggested their paper would come out).  There’s also an executive summary of the results here.

It turns out that the experiment has seen two events that might, just might, be the Weakly Interacting Massive Particles (WIMPs) that are most theorists favoured candidate for cold dark matter. However, they might also be due to background events generated by other stray particles getting into the works. It’s impossible to tell at this stage whether the signal is real or not. Based on the sort of naive  frequentist statistical treatment of the data that for some reason is what particle physicists seem to prefer, there’s a 23% chance of their signal being background rather than dark matter. In other words, it’s about a one-sigma detection. In fact, if you factor in the possibility of a systematic error in the background counts – these are very difficult things to calibrate precisely – then the significance of the result decreases even further. And if you do it all properly, in a Bayesian way with an appropriate prior then the most probable result is no detection. Andrew Jaffe gives some details on his blog.

There is no universally accepted criterion for what constitutes a definite detection, but I’ve been told recently by the editor of Nature himself that if it’s less than 3-sigma (a probability of about 1% of it arising) then they’re unlikely to publish it. If it’s 2-sigma (5%) then it’s interesting, but not conclusive, but at 1-sigma it’s not worth writing home about never mind writing a press release.

I should  add that none of their results has yet been subject to peer review either. I can only guess that CDMS must be undergoing a funding review pretty soon and wanted to use the media to show it was producing the goods. I can’t say I’m impressed with these antics, and I doubt if the reviewers will be either.

Unfortunately, the fact that this is all so inconclusive from a scientific point of view hasn’t stopped various organs getting hold of the wrong end of the stick and starting to beat about the bush with it. New Scientist‘s Twitter feed screamed

Clear signal of dark matter detected in Minnesota!

although the article itself was a bit better informed. The Guardian ran a particularly poor story,  impressive only in the way it crammed so many misconceptions into such a short piece.

This episode takes me back to a theme I’ve touched on many times on this blog, which is that scientific results are very rarely black-and-white and they have to be treated carefully in appropriate probabilistic terms. Unfortunately, the media and the public have a great deal of difficulty understanding the subtleties of this and what gets across in the public domain can be either garbled or downright misleading. Most often in science the correct answer isn’t “true” or “false” but somewhere in between.

Of course, with more measurements, better statistics and stronger control of systematics this CDMS result may well turn into a significant detection. If it does then it will be a great scientific breakthrough and they’ll have my congratulations straight away, tempered with a certain amount of sadness that there will be no UK competitors in the race owing to our recent savage funding cuts. But we’re not there yet. So far, it’s just a definite maybe.

The Monkey Complex

Posted in Bad Statistics, The Universe and Stuff with tags , , , , , on November 15, 2009 by telescoper

There’s an old story that if you leave a set of monkeys hammering on typewriters for a sufficiently long time then they will eventually reproduce the entire text of Shakespeare’s play Hamlet. It comes up in a variety of contexts, but the particular generalisation of this parable in cosmology is to argue that if we live in an enormously big universe (or “multiverse“), in which the laws of nature (as specified by the relevant fundamental constants) vary “sort of randomly” from place to place, then there will be a domain in which they have the right properties for life to evolve. This is one way of explaining away the apparent fine-tuning of the laws of physics: they’re not finely tuned, but we just live in a place where they allowed us to evolve. Although it may seem an easy step from monkeys to the multiverse, it always seemed to me a very shaky one.

For a start, let’s go back to the monkeys. The supposition that given an infinite time the monkeys must produce everything that’s possible in a finite sequence, is not necessarily true even if one does allow an infinite time. It depends on how they type. If the monkeys were always to hit two adjoining keys at the same time then they would never produce a script for Hamlet, no matter how long they typed for, as the combinations QW or ZX do not appear anywhere in that play. To guarantee what we need the kind their typing has to be ergodic, a very specific requirement not possessed by all “random” sequences.

A more fundamental problem is what is meant by randomness in the first place. I’ve actually commented on this before, in a post that still seems to be collecting readers so I thought I’d develop one or two of the ideas a little.

 It is surprisingly easy to generate perfectly deterministic mathematical sequences that behave in the way we usually take to characterize indeterministic processes. As a very simple example, consider the following “iteration” scheme:

 X_{j+1}= 2 X_{j} \mod(1)

If you are not familiar with the notation, the term mod(1) just means “drop the integer part”.  To illustrate how this works, let us start with a (positive) number, say 0.37. To calculate the next value I double it (getting 0.74) and drop the integer part. Well, 0.74 does not have an integer part so that’s fine. This value (0.74) becomes my first iterate. The next one is obtained by putting 0.74 in the formula, i.e. doubling it (1.48) and dropping  the integer part: result 0.48. Next one is 0.96, and so on. You can carry on this process as long as you like, using each output number as the input state for the following step of the iteration.

Now to simplify things a little bit, notice that, because we drop the integer part each time, all iterates must lie in the range between 0 and 1. Suppose I divide this range into two bins, labelled “heads” for X less than ½ and “tails” for X greater than or equal to ½. In my example above the first value of X is 0.37 which is “heads”. Next is 0.74 (tails); then 0.48 (heads), 0.96(heads), and so on.

This sequence now mimics quite accurately the tossing of a fair coin. It produces a pattern of heads and tails with roughly 50% frequency in a long run. It is also difficult to predict the next term in the series given only the classification as “heads” or “tails”.

However, given the seed number which starts off the process, and of course the algorithm, one could reproduce the entire sequence. It is not random, but in some respects  looks like it is.

One can think of “heads” or “tails” in more general terms, as indicating the “0” or “1” states in the binary representation of a number. This method can therefore be used to generate the any sequence of digits. In fact algorithms like this one are used in computers for generating what are called pseudorandom numbers. They are not precisely random because computers can only do arithmetic to a finite number of decimal places. This means that only a finite number of possible sequences can be computed, so some repetition is inevitable, but these limitations are not always important in practice.

The ability to generate  random numbers accurately and rapidly in a computer has led to an entirely new way of doing science. Instead of doing real experiments with measuring equipment and the inevitable errors, one can now do numerical experiments with pseudorandom numbers in order to investigate how an experiment might work if we could do it. If we think we know what the result would be, and what kind of noise might arise, we can do a random simulation to discover the likelihood of success with a particular measurement strategy. This is called the “Monte Carlo” approach, and it is extraordinarily powerful. Observational astronomers and particle physicists use it a great deal in order to plan complex observing programmes and convince the powers that be that their proposal is sufficiently feasible to be allocated time on expensive facilities. In the end there is no substitute for real experiments, but in the meantime the Monte Carlo method can help avoid wasting time on flawed projects:

…in real life mistakes are likely to be irrevocable. Computer simulation, however, makes it economically practical to make mistakes on purpose.

(John McLeod and John Osborne, in Natural Automata and Useful Simulations).

So is there a way to tell whether a set of numbers is really random? Consider the following sequence:

1415926535897932384626433832795028841971

Is this a random string of numbers? There doesn’t seem to be a discernible pattern, and each possible digit seems to occur with roughly the same frequency. It doesn’t look like anyone’s phone number or bank account. Is that enough to make you think it is random?

Actually this is not at all random. If I had started it with a three and a decimal place you might have cottoned on straight away. “3.1415926..” is the first few digits in the decimal representation of p. The full representation goes on forever without repeating. This is a sequence that satisfies most naïve definitions of randomness. It does, however, provide something of a hint as to how we might construct an operational definition, i.e. one that we can apply in practice to a finite set of numbers.

The key idea originates from the Russian mathematician Andrei Kolmogorov, who wrote the first truly rigorous mathematical work on probability theory in 1933. Kolmogorov’s approach was considerably ahead of its time, because it used many concepts that belong to the era of computers. In essence, what he did was to provide a definition of the complexity of an N-digit sequence in terms of the smallest amount of computer memory it would take to store a program capable of generating the sequence. Obviously one can always store the sequence itself, which means that there is always a program that occupies about as many bytes of memory as the sequence itself, but some numbers can be generated by codes much shorter than the numbers themselves. For example the sequence

111111111111111111111111111111111111

can be generated by the instruction to “print 1 35 times”, which can be stored in much less memory than the original string of digits. Such a sequence is therefore said to be algorithmically compressible.

There are many ways of calculating the digits of π numerically also, so although it may look superficially like a random string it is most definitely not random. It is algorithmically compressible.

I’m not sure how compressible Hamlet is, but it’s certainly not entirely random. When I studied it at school I certainly wished it were a little shorter…

The complexity of a sequence can be defined to be the length of the shortest program capable of generating it. If no algorithm can be found that compresses the sequence into a program shorter than itself then it is maximally complex and can suitably be defined as random. This is a very elegant description, and has good intuitive appeal.  

I’m not sure how compressible Hamlet is, but it’s certainly not entirely random. At any rate, when I studied it at school, I certainly wished it were a little shorter…

However, this still does not provide us with a way of testing rigorously whether a given finite sequence has been produced “randomly” or not.

If an algorithmic compression can be found then that means we declare the given sequence not to be  random. However we can never be sure if the next term in the sequence would fit with what our algorithm would predict. We have to argue, inferentially, that if we have fit a long sequence with a simple algorithm then it is improbable that the sequence was generated randomly.

On the other hand, if we fail to find a suitable compression that doesn’t mean it is random either. It may just mean we didn’t look hard enough or weren’t clever enough.

Human brains are good at finding patterns. When we can’t see one we usually take the easy way out and declare that none exists. We often model a complicated system as a random process because it is  too difficult to predict its behaviour accurately even if we know the relevant laws and have  powerful computers at our disposal. That’s a very reasonable thing to do when there is no practical alternative. 

It’s quite another matter, however,  to embrace randomness as a first principle to avoid looking for an explanation in the first place. For one thing, it’s lazy, taking the easy way out like that. And for another it’s a bit arrogant. Just because we can’t find an explanation within the framework of our current theories doesn’t mean more intelligent creatures than us won’t do so. We’re only monkeys, after all.

Godless Uncertainty

Posted in Bad Statistics with tags , , , , , , on November 5, 2009 by telescoper

As usual I’m a bit slow to comment on something that’s been the topic of much twittering and blogging over the past few days. This one is the terrible article by A.N. Wilson in, inevitably, the Daily Mail. I’ve already fumed once at the Mail and didn’t really want to go off the deep end again so soon after that. But here goes anyway. The piece by Wilson is a half-baked pile of shit not worth wasting energy investigating too deeply, but there are a few points I think it might be worth making even if I am a bit late with my rant.

The article is a response to the (justifiable) outcry after the government sacked Professor David Nutt, an independent scientific adviser, for having the temerity to give independent scientific advice. His position was Chair of the Advisory Council on the Misuse of Drugs, and his sin was to have pointed out the ludicrous inconsistency of government policies on drug abuse compared to other harmful activities such as smoking and drinking. The issues have been aired, protests lodged and other members of the Advisory Council have resigned in protest. Except to say I think the government’s position is indefensible I can’t add much here that hasn’t been said.

This is the background to Wilson’s article which is basically a backlash against the backlash. The (verbose) headline states

Yes, scientists do much good. But a country run by these arrogant gods of certainty would truly be hell on earth.

Obviously he’s not afraid of generalisation. All scientists are arrogant; everyone knows it because it says so in the Daily Mail. There’s another irony too. Nutt’s argument was all about the proper way to assess risk arising from drug use, and was appropriately phrased  in language not of certainty but of probability. But the Mail never lets truth get in the way of a good story.

He goes on

The trouble with a ‘scientific’ argument, of course, is that it is not made in the real world, but in a laboratory by an unimaginative academic relying solely on empirical facts.

It’s desperately sad that there are people – even moderately intelligent ones like Wilson – who think that’s what science is like. Unimaginative? Nothing could be further from the truth. It takes a great deal of imagination (and hard work) to come up with a theory. Few scientists have the imagination of an Einstein or a Feynman, but at least most of us recognize the importance of creativity in advancing knowledge.  But even imagination is not enough for a scientist. Once we have a beautiful hypothesis we must then try to subject it to rigorous quantitative testing. Even if we have spent years nurturing it, we have to let it die if it doesn’t fit the data. That takes courage and integrity too.

Imagination. Courage. Integrity. Not qualities ever likely be associated with someone who writes for the Daily Mail.

That’s not to say that scientists are all perfect. We are human. Sometimes the process doesn’t work at all well. Mistakes are made. There is occasional misconduct. Researchers get too wedded to their pet theories. There can be measurement glitches. But the scientific method at least requires its practitioners to approach the subject rationally and objectively, taking into account all relevant factors and eschewing arguments based on sheer prejudice. You can see why Daily Mail writers don’t like scientists. Facts make them uncomfortable.

Wilson goes on to blame science for some of the atrocities perpetrated by Hitler:

Going back in time, some people think that Hitler invented the revolting experiments performed by Dr Mengele on human beings and animals.

But the Nazis did not invent these things. The only difference between Hitler and previous governments was that he believed, with babyish credulity, in science as the only truth. He allowed scientists freedoms which a civilised government would have checked.

Garbage. Hitler knew nothing about science. Had he done so he wouldn’t have driven out a huge proportion of the talented scientists in Germany’s universities and stuffed their departments full of ghoulish dolts who supported his prejudices.

It was only after reading the article that it was pointed out to be that this particularly offensive passage invoked Godwin’s Law: anyone who brings Hitler into an argument has already lost the debate.

Wilson’s piece seems to be a modern-day manifestation of old problem, famously expounded by C.P. Snow in his lecture on Two Cultures. The issue is that the overwhelming majority of people in positions of power and influence, including the media, are entirely illiterate from a scientific point of view. Science is viewed by most people with either incomprehension or suspicion (and sometimes both).

As society becomes more reliant on science and technology, the fewer people there are that seem to understand what science is or how it works. Moronic articles like Wilson’s indicate the depth of the problem.
Who needs scientific literacy when you can get paid a large amount of money for writing sheer drivel?

I’m sure a great many scientists would agree with most of what I’ve said but I’d like to end with a comment that might be a bit more controversial. I do agree to some extent with Wilson, in that I think some scientists insist on claiming things are facts when they don’t have that status at all. I remember being on a TV programme in which a prominent cosmologist said that he thought the Big Bang was as real to him as the fact that the Sun is shining. I think it’s quite irrational to be that certain. Time and time again scientists present their work to the public in a language that suggests unshakeable self-belief. Sometimes they are badgered into doing that by journalists who want to simplify everything to a level they (and the public) can understand. But some don’t need any encouragement. Too many scientists are too comfortable presenting their profession as some sort of priesthood even if they do stop short of playing God.

2006-11-09-1525-20The critical importance of dealing rationally with uncertainty in science, both within itself and in its relationship to society at large, was the principal issue I addressed in From Cosmos to Chaos, a paperback edition of which is about to be published by Oxford University Press..

From the jacket blurb:

Why do so many people think that science is about absolute certainty when, at its core, it is actually dominated by uncertainty?

I’ve blogged before about why I think scientists need to pay much more attention to the role of statistics and probability when they explain what they do to the wider world.

And to anyone who accuses me of using the occasion presented by Wilson’s article to engage in gratuitous marketing, I have only one answer:

BUY MY BOOK!

A Dutch Book

Posted in Bad Statistics with tags , , , on October 28, 2009 by telescoper

When I was a research student at Sussex University I lived for a time in Hove, close to the local Greyhound track. I soon discovered that going to the dogs could be both enjoyable and instructive. The card for an evening would usually consist of ten races, each involving six dogs. It didn’t take long for me to realise that it was quite boring to watch the greyhounds unless you had a bet, so I got into the habit of making small investments on each race. In fact, my usual bet would involve trying to predict both first and second place, the kind of combination bet which has longer odds and therefore generally has a better return if you happen to get it right.

imageresizer

The simplest way to bet is through a totalising pool system (called “The Tote”) in which the return on a successful bet  is determined by how much money has been placed on that particular outcome; the higher the amount staked, the lower the return for an individual winner. The Tote accepts very small bets, which suited me because I was an impoverished student in those days. The odds at any particular time are shown on the giant Tote Board you can see in the picture above.

However, every now and again I would place bets with one of the independent trackside bookies who set their own odds. Here the usual bet is for one particular dog to win, rather than on 1st/2nd place combinations. Sometimes these odds were much more generous than those that were showing on the Tote Board so I gave them a go. When bookies offer long odds, however, it’s probably because they know something the punters don’t and I didn’t win very often.

I often watched the bookmakers in action, chalking the odds up, sometimes lengthening them to draw in new bets or sometimes shortening them to discourage bets if they feared heavy losses. It struck me that they have to be very sharp when they change odds in this way because it’s quite easy to make a mistake that might result in a combination bet guaranteeing a win for a customer.

With six possible winners it takes a while to work out if there is such a strategy but to explain what I mean consider  a  race with three competitors. The bookie assigns odds as follows : (1) even money; (2) 3/1 against; and (3)  4/1 against. The quoted odds imply probabilities to win of 50% (1 in 2), 25% (1 in 4) and 20% (1 in 5) respectively.

Now suppose you  place in three different bets:  £100 on (1) to win, £50 on (2) and £40 on (3).  Your total stake is then £190. If (1) succeeds you win £100 and also get your stake back; you lose the other stakes, but you have turned £190 into £200 so are up £10  overall. If (2) wins you also come out with £200: your £50 stake plus £150 for the bet. Likewise if (3) wins. You win whatever the outcome of the race. It’s not a question of being lucky, just that the odds have been designed inconsistently.

I stress that I never saw a bookie actually do this. If one did, he’d soon go out of business. An inconsistent set of odds like this is called a Dutch Book, and a bet which guarantees the better a positive return is often called a lock. It’s the also the principle behind many share-trading schemes based on the idea of arbitrage.

It was only much  later I realised that there is a nice way of turning the Dutch Book argument around to derive the laws of probability from the principle that the odds be consistent, i.e. so that they do not lead to situations where a Dutch Book arises.

To see this, I’ll just generalise the above discussion a bit. Imagine you are a gambler interested in betting on the outcome of some event. If the game is fair, you would have expect to pay a stake px to win an amount x if the probability of the winning outcome is p.

Now  imagine that there are several possible outcomes, each with different probabilities, and you are allowed to bet a different amount on each of them. Clearly, the bookmaker has to be careful that there is no combination of bets that guarantees that you (the punter) will win.

Now consider a specific example. Suppose there are three possible outcomes; call them A, B, and C. Your bookie will accept the following bets: a bet on A with a payoff xA, for which the stake is pAxA; a bet on B for which the return  is xB and the stake  pBxB; and a bet on C with stake  pCxC and payoff xC.

Think about what happens in the special case where the events A and B are mutually exclusive (which just means that they can’t both happen) and C is just given by  A “OR” B, i.e. the event that either A or B happens. There are then three possible outcomes.

First, if A happens but B does not happen the net return to the gambler is

R=x_A(1-P_A)-x_BP_B+x_c(1-P_C).

The first term represents the difference between the stake and the return for the successful bet on A, the second is the lost stake corresponding to the failed bet on the event B, and the third term arises from the successful bet on C. The bet on C succeeds because if A happens then A”OR”B must happen too.

Alternatively, if B happens but A does not happen, the net return is

R=-x_A P_A -x_B(1-P_B)+x_c(1-P_C),

in a similar way to the previous result except that the bet on A loses, while those on B and C succeed.

Finally there is the possibility that neither A nor B succeeds: in this case the gambler does not win at all, and the return (which is bound to be negative) is

R=-x_AP_A-x_BP_B -x_C P_C.

Notice that A and B can’t both happen because I have assumed that they are mutually exclusive. For the game to be consistent (in the sense I’ve discussed above) we need to have

\textrm{det} \left( \begin{array}{ccc} 1- P_A & -P_B & 1-P_C \\ -P_A & 1-P_B & 1-P_C\\ -P_A & -P_B & -P_C \end{array} \right)=P_A+P_B-P_C=0.

This means that

P_C=P_A+P_B

so, since C is the event A “OR” B, this means that the probabilityof two mutually exclusive events A and B is the sum of the separate probabilities of A and B. This is usually taught as one of the axioms from which the calculus of probabilities is derived, but what this discussion shows is that it can itself be derived in this way from the principle of consistency. It is the only way to combine probabilities  that is consistent from the point of view of betting behaviour. Similar logic leads to the other rules of probability, including those for events which are not mutually exclusive.

Notice that this kind of consistency has nothing to do with averages over a long series of repeated bets: if the rules are violated then the game itself is rigged.

A much more elegant and complete derivation of the laws of probability has been set out by Cox, but I find the Dutch Book argument a  nice practical way to illustrate the important difference between being unlucky and being irrational.

P.S. For legal reasons I should point out that, although I was a research student at the University of Sussex, I do not have a PhD. My doctorate is a DPhil.