Archive for the Bad Statistics Category

Godless Uncertainty

Posted in Bad Statistics with tags , , , , , , on November 5, 2009 by telescoper

As usual I’m a bit slow to comment on something that’s been the topic of much twittering and blogging over the past few days. This one is the terrible article by A.N. Wilson in, inevitably, the Daily Mail. I’ve already fumed once at the Mail and didn’t really want to go off the deep end again so soon after that. But here goes anyway. The piece by Wilson is a half-baked pile of shit not worth wasting energy investigating too deeply, but there are a few points I think it might be worth making even if I am a bit late with my rant.

The article is a response to the (justifiable) outcry after the government sacked Professor David Nutt, an independent scientific adviser, for having the temerity to give independent scientific advice. His position was Chair of the Advisory Council on the Misuse of Drugs, and his sin was to have pointed out the ludicrous inconsistency of government policies on drug abuse compared to other harmful activities such as smoking and drinking. The issues have been aired, protests lodged and other members of the Advisory Council have resigned in protest. Except to say I think the government’s position is indefensible I can’t add much here that hasn’t been said.

This is the background to Wilson’s article which is basically a backlash against the backlash. The (verbose) headline states

Yes, scientists do much good. But a country run by these arrogant gods of certainty would truly be hell on earth.

Obviously he’s not afraid of generalisation. All scientists are arrogant; everyone knows it because it says so in the Daily Mail. There’s another irony too. Nutt’s argument was all about the proper way to assess risk arising from drug use, and was appropriately phrased  in language not of certainty but of probability. But the Mail never lets truth get in the way of a good story.

He goes on

The trouble with a ‘scientific’ argument, of course, is that it is not made in the real world, but in a laboratory by an unimaginative academic relying solely on empirical facts.

It’s desperately sad that there are people – even moderately intelligent ones like Wilson – who think that’s what science is like. Unimaginative? Nothing could be further from the truth. It takes a great deal of imagination (and hard work) to come up with a theory. Few scientists have the imagination of an Einstein or a Feynman, but at least most of us recognize the importance of creativity in advancing knowledge.  But even imagination is not enough for a scientist. Once we have a beautiful hypothesis we must then try to subject it to rigorous quantitative testing. Even if we have spent years nurturing it, we have to let it die if it doesn’t fit the data. That takes courage and integrity too.

Imagination. Courage. Integrity. Not qualities ever likely be associated with someone who writes for the Daily Mail.

That’s not to say that scientists are all perfect. We are human. Sometimes the process doesn’t work at all well. Mistakes are made. There is occasional misconduct. Researchers get too wedded to their pet theories. There can be measurement glitches. But the scientific method at least requires its practitioners to approach the subject rationally and objectively, taking into account all relevant factors and eschewing arguments based on sheer prejudice. You can see why Daily Mail writers don’t like scientists. Facts make them uncomfortable.

Wilson goes on to blame science for some of the atrocities perpetrated by Hitler:

Going back in time, some people think that Hitler invented the revolting experiments performed by Dr Mengele on human beings and animals.

But the Nazis did not invent these things. The only difference between Hitler and previous governments was that he believed, with babyish credulity, in science as the only truth. He allowed scientists freedoms which a civilised government would have checked.

Garbage. Hitler knew nothing about science. Had he done so he wouldn’t have driven out a huge proportion of the talented scientists in Germany’s universities and stuffed their departments full of ghoulish dolts who supported his prejudices.

It was only after reading the article that it was pointed out to be that this particularly offensive passage invoked Godwin’s Law: anyone who brings Hitler into an argument has already lost the debate.

Wilson’s piece seems to be a modern-day manifestation of old problem, famously expounded by C.P. Snow in his lecture on Two Cultures. The issue is that the overwhelming majority of people in positions of power and influence, including the media, are entirely illiterate from a scientific point of view. Science is viewed by most people with either incomprehension or suspicion (and sometimes both).

As society becomes more reliant on science and technology, the fewer people there are that seem to understand what science is or how it works. Moronic articles like Wilson’s indicate the depth of the problem.
Who needs scientific literacy when you can get paid a large amount of money for writing sheer drivel?

I’m sure a great many scientists would agree with most of what I’ve said but I’d like to end with a comment that might be a bit more controversial. I do agree to some extent with Wilson, in that I think some scientists insist on claiming things are facts when they don’t have that status at all. I remember being on a TV programme in which a prominent cosmologist said that he thought the Big Bang was as real to him as the fact that the Sun is shining. I think it’s quite irrational to be that certain. Time and time again scientists present their work to the public in a language that suggests unshakeable self-belief. Sometimes they are badgered into doing that by journalists who want to simplify everything to a level they (and the public) can understand. But some don’t need any encouragement. Too many scientists are too comfortable presenting their profession as some sort of priesthood even if they do stop short of playing God.

2006-11-09-1525-20The critical importance of dealing rationally with uncertainty in science, both within itself and in its relationship to society at large, was the principal issue I addressed in From Cosmos to Chaos, a paperback edition of which is about to be published by Oxford University Press..

From the jacket blurb:

Why do so many people think that science is about absolute certainty when, at its core, it is actually dominated by uncertainty?

I’ve blogged before about why I think scientists need to pay much more attention to the role of statistics and probability when they explain what they do to the wider world.

And to anyone who accuses me of using the occasion presented by Wilson’s article to engage in gratuitous marketing, I have only one answer:

BUY MY BOOK!

A Dutch Book

Posted in Bad Statistics with tags , , , on October 28, 2009 by telescoper

When I was a research student at Sussex University I lived for a time in Hove, close to the local Greyhound track. I soon discovered that going to the dogs could be both enjoyable and instructive. The card for an evening would usually consist of ten races, each involving six dogs. It didn’t take long for me to realise that it was quite boring to watch the greyhounds unless you had a bet, so I got into the habit of making small investments on each race. In fact, my usual bet would involve trying to predict both first and second place, the kind of combination bet which has longer odds and therefore generally has a better return if you happen to get it right.

imageresizer

The simplest way to bet is through a totalising pool system (called “The Tote”) in which the return on a successful bet  is determined by how much money has been placed on that particular outcome; the higher the amount staked, the lower the return for an individual winner. The Tote accepts very small bets, which suited me because I was an impoverished student in those days. The odds at any particular time are shown on the giant Tote Board you can see in the picture above.

However, every now and again I would place bets with one of the independent trackside bookies who set their own odds. Here the usual bet is for one particular dog to win, rather than on 1st/2nd place combinations. Sometimes these odds were much more generous than those that were showing on the Tote Board so I gave them a go. When bookies offer long odds, however, it’s probably because they know something the punters don’t and I didn’t win very often.

I often watched the bookmakers in action, chalking the odds up, sometimes lengthening them to draw in new bets or sometimes shortening them to discourage bets if they feared heavy losses. It struck me that they have to be very sharp when they change odds in this way because it’s quite easy to make a mistake that might result in a combination bet guaranteeing a win for a customer.

With six possible winners it takes a while to work out if there is such a strategy but to explain what I mean consider  a  race with three competitors. The bookie assigns odds as follows : (1) even money; (2) 3/1 against; and (3)  4/1 against. The quoted odds imply probabilities to win of 50% (1 in 2), 25% (1 in 4) and 20% (1 in 5) respectively.

Now suppose you  place in three different bets:  £100 on (1) to win, £50 on (2) and £40 on (3).  Your total stake is then £190. If (1) succeeds you win £100 and also get your stake back; you lose the other stakes, but you have turned £190 into £200 so are up £10  overall. If (2) wins you also come out with £200: your £50 stake plus £150 for the bet. Likewise if (3) wins. You win whatever the outcome of the race. It’s not a question of being lucky, just that the odds have been designed inconsistently.

I stress that I never saw a bookie actually do this. If one did, he’d soon go out of business. An inconsistent set of odds like this is called a Dutch Book, and a bet which guarantees the better a positive return is often called a lock. It’s the also the principle behind many share-trading schemes based on the idea of arbitrage.

It was only much  later I realised that there is a nice way of turning the Dutch Book argument around to derive the laws of probability from the principle that the odds be consistent, i.e. so that they do not lead to situations where a Dutch Book arises.

To see this, I’ll just generalise the above discussion a bit. Imagine you are a gambler interested in betting on the outcome of some event. If the game is fair, you would have expect to pay a stake px to win an amount x if the probability of the winning outcome is p.

Now  imagine that there are several possible outcomes, each with different probabilities, and you are allowed to bet a different amount on each of them. Clearly, the bookmaker has to be careful that there is no combination of bets that guarantees that you (the punter) will win.

Now consider a specific example. Suppose there are three possible outcomes; call them A, B, and C. Your bookie will accept the following bets: a bet on A with a payoff xA, for which the stake is pAxA; a bet on B for which the return  is xB and the stake  pBxB; and a bet on C with stake  pCxC and payoff xC.

Think about what happens in the special case where the events A and B are mutually exclusive (which just means that they can’t both happen) and C is just given by  A “OR” B, i.e. the event that either A or B happens. There are then three possible outcomes.

First, if A happens but B does not happen the net return to the gambler is

R=x_A(1-P_A)-x_BP_B+x_c(1-P_C).

The first term represents the difference between the stake and the return for the successful bet on A, the second is the lost stake corresponding to the failed bet on the event B, and the third term arises from the successful bet on C. The bet on C succeeds because if A happens then A”OR”B must happen too.

Alternatively, if B happens but A does not happen, the net return is

R=-x_A P_A -x_B(1-P_B)+x_c(1-P_C),

in a similar way to the previous result except that the bet on A loses, while those on B and C succeed.

Finally there is the possibility that neither A nor B succeeds: in this case the gambler does not win at all, and the return (which is bound to be negative) is

R=-x_AP_A-x_BP_B -x_C P_C.

Notice that A and B can’t both happen because I have assumed that they are mutually exclusive. For the game to be consistent (in the sense I’ve discussed above) we need to have

\textrm{det} \left( \begin{array}{ccc} 1- P_A & -P_B & 1-P_C \\ -P_A & 1-P_B & 1-P_C\\ -P_A & -P_B & -P_C \end{array} \right)=P_A+P_B-P_C=0.

This means that

P_C=P_A+P_B

so, since C is the event A “OR” B, this means that the probabilityof two mutually exclusive events A and B is the sum of the separate probabilities of A and B. This is usually taught as one of the axioms from which the calculus of probabilities is derived, but what this discussion shows is that it can itself be derived in this way from the principle of consistency. It is the only way to combine probabilities  that is consistent from the point of view of betting behaviour. Similar logic leads to the other rules of probability, including those for events which are not mutually exclusive.

Notice that this kind of consistency has nothing to do with averages over a long series of repeated bets: if the rules are violated then the game itself is rigged.

A much more elegant and complete derivation of the laws of probability has been set out by Cox, but I find the Dutch Book argument a  nice practical way to illustrate the important difference between being unlucky and being irrational.

P.S. For legal reasons I should point out that, although I was a research student at the University of Sussex, I do not have a PhD. My doctorate is a DPhil.

The League of Extraordinary Gibberish

Posted in Bad Statistics with tags , , , on October 13, 2009 by telescoper

After a very busy few days I thought I’d relax yesterday by catching up with a bit of reading. In last week’s Times Higher I found there was a supplement giving this year’s World University Rankings.

I don’t really approve of league tables but somehow can’t resist looking in them to see where my current employer Cardiff University lies. There we are at number 135 in the list of the top 200 Universities. That’s actually not bad for an institute that’s struggling with a Welsh funding  system that seriously disadvantages it compared to our English colleagues. We’re a long way down compared to Cambridge (2nd), UCL (4th), Imperial and Oxford (5th=) . Compared to places I’ve worked at previously we’re significantly below Nottingham (91st) but still above Queen Mary (164) and Sussex (166). Number 1 in the world is Harvard, which is apparently somewhere near Boston (the American one).

Relieved that we’re in the top 200 at all, I decided to have a look at how the tables were drawn up. I wish I hadn’t bothered because I was horrified at the methodological garbage that lies behind it. You can find a full account of the travesty here. In essence, however, the ranking is arrived at by adding six distinct indicators, weighted differently but with weights assigned for no obvious reason, each of which is arrived at by dubious means and which is highly unlikely to mean what it purports. Each indicator is magically turned into a score out of 100 before being added to all the other ones (with appropriate weighting factors).

The indicators are:

  1. Academic Peer Review. This is weighted 40% of the overall score for each institution and is obtained by asking a sample of academics (selected in a way that is not explained). This year 9386 people were involved; they were asked to name institutions they regard as the best in their field. This sample is a tiny fraction of the global academic population and it would amaze me if it were representative of anything at all!
  2. Employer Survey. The pollsters asked 3281 graduate employers for their opinions of the different universities. This was weighted 10%.
  3. Staff-Student Ratio. Counting 20%, this is supposed to be a measure of “teaching quality”! Good teaching = large numbers of staff? Not if most of them don’t teach as at many research universities. A large staff-student ratio could even mean the place is really unpopular!
  4. International Faculty. This measures the  proportion of overseas staff on the books. Apparently a large number of foreign lecturers makes for a good university and “how attractive an institution is around the world”. Or perhaps that it finds it difficult to recruit its own nationals. This one counts only 5%.
  5. International Students. Another 5% goes to the fraction of each of the student body that is from overseas.
  6. Research Excellence. This is measured solely on the basis of citations – I’ve discussed some of the issues with that before – and counts 20%. They choose to use an unreliable database called SCOPUS, run by the profiteering academic publisher Elsevier. The total number of citations is divided by the number of faculty to “give a sense of the density of research excellence” at the institution.

Well I hope by now you’ve got a sense of the density of the idiots who compiled this farrago. Even if you set aside the issue of the accuracy of the input data, there is still the issue of how on Earth anyone could have thought it was sensible to pick such silly ways of measuring what makes a good university, assigning random weights to them, and then claiming that they had achieved something useful. They probably got paid a lot for doing it too. Talk about money for old rope. I’m in the wrong business.

What gives the game away entirely is the enormous variance from indicator to another. This means that changing the weights slightly would produce a drastically different list. And who is to say that the variables should be added linearly anyway? Is a score of 100 really worth precisely twice as much as a score of 50? What do the distributions look like? How significant are the differences in score from one institute to another? And what are we actually trying to measure anyway?

Here’s an example. The University of California at Berkeley scores 100/100 for 1,2 and 4 and 86 for 5. However for Staff/Student ratio (3) it gets a lowly 25/100 and for (6) it gets only 34, which combine take it down to 39th in the table. Exclude this curiously-chosen proxy for teaching quality and Berkeley would rocket up the table.

Of course you can laugh these things off as unimportant trivia to be looked at with mild amusement over a glass of wine, but such things have increasingly found their way into the minds of managers and politicians. The fact that they are based on flawed assumptions, use a daft methodology, and produce utterly meaningless results seems to be irrelevant. Because they are based on numbers they must represent some kind of absolute truth.

There’s nothing at all wrong with collating and publishing information about schools and universities. Such facts should be available to the public. What is wrong is the manic obsession with  condensing disparate sets of conflicting data into a single number just so things can be ordered in lists that politicians can understand.

You can see the same thing going on in the national newspapers’ lists of University rankings. Each one uses a different weighting and different data and the lists are drastically different. They give different answers because nobody has even bothered to think about what the question is.

The Law of Unreason

Posted in Bad Statistics, The Universe and Stuff with tags , , , , on October 11, 2009 by telescoper

Not much time to post today, so I thought I’d just put up a couple of nice little quotes about the Central Limit Theorem. In case you don’t know it, this theorem explains why so many phenomena result in measurable things whose frequencies of occurrence can be described by the Normal (Gaussian) distribution, with its characteristic Bell-shaped curve. I’ve already mentioned the role that various astronomers played in the development of this bit of mathematics, so I won’t repeat the story in this post.

In fact I was asked to prove the theorem during my PhD viva, and struggled to remember how to do it, but it’s such an important thing that it was quite reasonable for my examiners  to ask the question and quite reasonable for them to have expected me to answer it! If you want to know how to do it, then I’ll give you a hint: it involves a Fourier transform!

Any of you who took a peep at Joan Magueijo’s lecture that I posted about yesterday will know that the title of his talk was Anarchy and Physical Laws. The main issue he addressed was whether the existence of laws of physics requires that the Universe must have been designed or whether mathematical regularities could somehow emerge from a state of lawlessness. Why the Universe is lawful is of course one of the greatest mysteries of all, and one that, for some at least, transcends science and crosses over into the realm of theology.

In my little address at the end of Joao’s talk I drew an analogy with the Central Limit Theorem which is an example of an emergent mathematical law that describes situations which are apparently extremely chaotic. I just wanted to make the point that there are well-known examples of such things, even if the audience were sceptical about applying such notions to the entire Universe.

The quotation I picked was this one from Sir Francis Galton:

I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the “Law of Frequency of Error”. The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement, amidst the wildest confusion. The huger the mob, and the greater the apparent anarchy, the more perfect is its sway. It is the supreme law of Unreason. Whenever a large sample of chaotic elements are taken in hand and marshalled in the order of their magnitude, an unsuspected and most beautiful form of regularity proves to have been latent all along

However, it is worth remembering also that not everything has a normal distribution: the central limit theorem requires linear, additive behaviour of the variables involved. I posted about an example where this is not the case here. Theorists love to make the Gaussian assumption when dealing with phenomena that they want to model with stochastic processes because these make many calculations tractable that otherwise would be too difficult. In cosmology, for example, we usually assume that the primordial density perturbations that seeded the formation of large-scale structure obeyed Gaussian statistics. Observers and experimentalists frequently assume Gaussian measurement errors in order to apply off-the-shelf statistical methods to their results. Often nature is kind to us but every now and again we find anomalies that are inconsistent with the normal distribution. Those exceptions usually lead to clues that something interesting is going on that violates the terms of the Central Limit Theorem. There are inklings that this may be the case in cosmology.

So to balance Galton’s remarks, I add this quote by Gabriel Lippmann which I’ve taken the liberty of translating from the original French.

Everyone believes in the [normal] law of errors: the mathematicians, because they think it is an experimental fact; and the experimenters, because they suppose it is a theorem of mathematics

There are more things in heaven and earth than are described by the Gaussian distribution!

Astrostats

Posted in Bad Statistics, The Universe and Stuff with tags , , , , , , , , , on September 20, 2009 by telescoper

A few weeks ago I posted an item on the theme of how gambling games were good for the development of probability theory. That piece  contained a mention of one astronomer (Christiaan Huygens), but I wanted to take the story on a little bit to make the historical connection between astronomy and statistics more explicit.

Once the basics of mathematical probability had been worked out, it became possible to think about applying probabilistic notions to problems in natural philosophy. Not surprisingly, many of these problems were of astronomical origin but, on the way, the astronomers that tackled them also derived some of the basic concepts of statistical theory and practice. Statistics wasn’t just something that astronomers took off the shelf and used; they made fundamental contributions to the development of the subject itself.

The modern subject we now know as physics really began in the 16th and 17th century, although at that time it was usually called Natural Philosophy. The greatest early work in theoretical physics was undoubtedly Newton’s great Principia, published in 1687, which presented his idea of universal gravitation which, together with his famous three laws of motion, enabled him to account for the orbits of the planets around the Sun. But majestic though Newton’s achievements undoubtedly were, I think it is fair to say that the originator of modern physics was Galileo Galilei.

Galileo wasn’t as much of a mathematical genius as Newton, but he was highly imaginative, versatile and (very much unlike Newton) had an outgoing personality. He was also an able musician, fine artist and talented writer: in other words a true Renaissance man.  His fame as a scientist largely depends on discoveries he made with the telescope. In particular, in 1610 he observed the four largest satellites of Jupiter, the phases of Venus and sunspots. He immediately leapt to the conclusion that not everything in the sky could be orbiting the Earth and openly promoted the Copernican view that the Sun was at the centre of the solar system with the planets orbiting around it. The Catholic Church was resistant to these ideas. He was hauled up in front of the Inquisition and placed under house arrest. He died in the year Newton was born (1642).

These aspects of Galileo’s life are probably familiar to most readers, but hidden away among scientific manuscripts and notebooks is an important first step towards a systematic method of statistical data analysis. Galileo performed numerous experiments, though he certainly carry out the one with which he is most commonly credited. He did establish that the speed at which bodies fall is independent of their weight, not by dropping things off the leaning tower of Pisa but by rolling balls down inclined slopes. In the course of his numerous forays into experimental physics Galileo realised that however careful he was taking measurements, the simplicity of the equipment available to him left him with quite large uncertainties in some of the results. He was able to estimate the accuracy of his measurements using repeated trials and sometimes ended up with a situation in which some measurements had larger estimated errors than others. This is a common occurrence in many kinds of experiment to this day.

Very often the problem we have in front of us is to measure two variables in an experiment, say X and Y. It doesn’t really matter what these two things are, except that X is assumed to be something one can control or measure easily and Y is whatever it is the experiment is supposed to yield information about. In order to establish whether there is a relationship between X and Y one can imagine a series of experiments where X is systematically varied and the resulting Y measured.  The pairs of (X,Y) values can then be plotted on a graph like the example shown in the Figure.

XY

In this example on it certainly looks like there is a straight line linking Y and X, but with small deviations above and below the line caused by the errors in measurement of Y. This. You could quite easily take a ruler and draw a line of “best fit” by eye through these measurements. I spent many a tedious afternoon in the physics labs doing this sort of thing when I was at school. Ideally, though, what one wants is some procedure for fitting a mathematical function to a set of data automatically, without requiring any subjective intervention or artistic skill. Galileo found a way to do this. Imagine you have a set of pairs of measurements (xi,yi) to which you would like to fit a straight line of the form y=mx+c. One way to do it is to find the line that minimizes some measure of the spread of the measured values around the theoretical line. The way Galileo did this was to work out the sum of the differences between the measured yi and the predicted values mx+c at the measured values x=xi. He used the absolute difference |yi-(mxi+c)| so that the resulting optimal line would, roughly speaking, have as many of the measured points above it as below it. This general idea is now part of the standard practice of data analysis, and as far as I am aware, Galileo was the first scientist to grapple with the problem of dealing properly with experimental error.

error

The method used by Galileo was not quite the best way to crack the puzzle, but he had it almost right. It was again an astronomer who provided the missing piece and gave us essentially the same method used by statisticians (and astronomy) today.

Karl Friedrich Gauss was undoubtedly one of the greatest mathematicians of all time, so it might be objected that he wasn’t really an astronomer. Nevertheless he was director of the Observatory at Göttingen for most of his working life and was a keen observer and experimentalist. In 1809, he developed Galileo’s ideas into the method of least-squares, which is still used today for curve fitting.

This approach involves basically the same procedure but involves minimizing the sum of [yi-(mxi+c)]2 rather than |yi-(mxi+c)|. This leads to a much more elegant mathematical treatment of the resulting deviations – the “residuals”.  Gauss also did fundamental work on the mathematical theory of errors in general. The normal distribution is often called the Gaussian curve in his honour.

After Galileo, the development of statistics as a means of data analysis in natural philosophy was dominated by astronomers. I can’t possibly go systematically through all the significant contributors, but I think it is worth devoting a paragraph or two to a few famous names.

I’ve already mentioned Jakob Bernoulli, whose famous book on probability was probably written during the 1690s. But Jakob was just one member of an extraordinary Swiss family that produced at least 11 important figures in the history of mathematics.  Among them was Daniel Bernoulli who was born in 1700.  Along with the other members of his famous family, he had interests that ranged from astronomy to zoology. He is perhaps most famous for his work on fluid flows which forms the basis of much of modern hydrodynamics, especially Bernouilli’s principle, which accounts for changes in pressure as a gas or liquid flows along a pipe of varying width.
But the elder Jakob’s work on gambling clearly also had some effect on Daniel, as in 1735 the younger Bernoulli published an exceptionally clever study involving the application of probability theory to astronomy. It had been known for centuries that the orbits of the planets are confined to the same part in the sky as seen from Earth, a narrow band called the Zodiac. This is because the Earth and the planets orbit in approximately the same plane around the Sun. The Sun’s path in the sky as the Earth revolves also follows the Zodiac. We now know that the flattened shape of the Solar System holds clues to the processes by which it formed from a rotating cloud of cosmic debris that formed a disk from which the planets eventually condensed, but this idea was not well established in the time of Daniel Bernouilli. He set himself the challenge of figuring out what the chance was that the planets were orbiting in the same plane simply by chance, rather than because some physical processes confined them to the plane of a protoplanetary disk. His conclusion? The odds against the inclinations of the planetary orbits being aligned by chance were, well, astronomical.

The next “famous” figure I want to mention is not at all as famous as he should be. John Michell was a Cambridge graduate in divinity who became a village rector near Leeds. His most important idea was the suggestion he made in 1783 that sufficiently massive stars could generate such a strong gravitational pull that light would be unable to escape from them.  These objects are now known as black holes (although the name was coined much later by John Archibald Wheeler). In the context of this story, however, he deserves recognition for his use of a statistical argument that the number of close pairs of stars seen in the sky could not arise by chance. He argued that they had to be physically associated, not fortuitous alignments. Michell is therefore credited with the discovery of double stars (or binaries), although compelling observational confirmation had to wait until William Herschel’s work of 1803.

It is impossible to overestimate the importance of the role played by Pierre Simon, Marquis de Laplace in the development of statistical theory. His book A Philosophical Essay on Probabilities, which began as an introduction to a much longer and more mathematical work, is probably the first time that a complete framework for the calculation and interpretation of probabilities ever appeared in print. First published in 1814, it is astonishingly modern in outlook.

Laplace began his scientific career as an assistant to Antoine Laurent Lavoiser, one of the founding fathers of chemistry. Laplace’s most important work was in astronomy, specifically in celestial mechanics, which involves explaining the motions of the heavenly bodies using the mathematical theory of dynamics. In 1796 he proposed the theory that the planets were formed from a rotating disk of gas and dust, which is in accord with the earlier assertion by Daniel Bernouilli that the planetary orbits could not be randomly oriented. In 1776 Laplace had also figured out a way of determining the average inclination of the planetary orbits.

A clutch of astronomers, including Laplace, also played important roles in the establishment of the Gaussian or normal distribution.  I have also mentioned Gauss’s own part in this story, but other famous astronomers played their part. The importance of the Gaussian distribution owes a great deal to a mathematical property called the Central Limit Theorem: the distribution of the sum of a large number of independent variables tends to have the Gaussian form. Laplace in 1810 proved a special case of this theorem, and Gauss himself also discussed it at length.

A general proof of the Central Limit Theorem was finally furnished in 1838 by another astronomer, Friedrich Wilhelm Bessel– best known to physicists for the functions named after him – who in the same year was also the first man to measure a star’s distance using the method of parallax. Finally, the name “normal” distribution was coined in 1850 by another astronomer, John Herschel, son of William Herschel.

I hope this gets the message across that the histories of statistics and astronomy are very much linked. Aspiring young astronomers are often dismayed when they enter research by the fact that they need to do a lot of statistical things. I’ve often complained that physics and astronomy education at universities usually includes almost nothing about statistics, because that is the one thing you can guarantee to use as a researcher in practically any branch of the subject.

Over the years, statistics has become regarded as slightly disreputable by many physicists, perhaps echoing Rutherford’s comment along the lines of “If your experiment needs statistics, you ought to have done a better experiment”. That’s a silly statement anyway because all experiments have some form of error that must be treated statistically, but it is particularly inapplicable to astronomy which is not experimental but observational. Astronomers need to do statistics, and we owe it to the memory of all the great scientists I mentioned above to do our statistics properly.

Game Theory

Posted in Bad Statistics, Books, Talks and Reviews, The Universe and Stuff with tags , , , on September 5, 2009 by telescoper

Nowadays gambling is generally looked down on as something shady and disreputable, not to be discussed in polite company, or even to be banned altogether. However, the  formulation of the basic laws of probability was almost exclusively inspired by their potential application to games of chance. Once established, these laws found a much wide range of applications in scientific contexts, including my own field of astronomy. I thought I’d illustrate this connection with a couple of examples. You may think that I’m just trying to make excuses for the fact that I also enjoy the odd bet every now and then!

Gambling in various forms has been around for millennia. Sumerian and Assyrian archaeological sites are littered with examples of a certain type of bone, called the astragalus (or talus bone). This is found just above the heel and its shape (in sheep and deer at any rate) is such that when it is tossed in the air it can land in any one of four possible orientations. It can therefore be used to generate “random” outcomes and is in many ways the forerunner of modern six-sided dice. The astragalus is known to have been used for gambling games as early as 3600 BC.

images

Unlike modern dice, which appeared around 2000BC, the astragalus is not symmetrical, giving a different probability of it landing in each orientation. It is not thought that there was a mathematical understanding of how to calculate odds in games involving this object or its more symmetrical successors.

Games of chance also appear to have been commonplace in the time of Christ – Roman soldiers are supposed to have drawn lots at the crucifixion, for example – but there is no evidence of any really formalised understanding of the laws of probability at this time.

Playing cards emerged in China sometime during the tenth century BC and were available in western europe by the 14th Century. This is an interesting development because playing cards can be used for games such as contract Bridge which involve a great deal of pure skill as well as an element of randomness. Perhaps it is this aspect that finally got serious intellectuals (i.e. physicists) excited about probability theory.

The first book on probability that I am aware of was by Gerolamo Cardano. His Liber de Ludo Aleae ( Book on Games of Chance) was published in 1663, but it was written more than a century earlier than this date.  Probability theory really got going in 1654 with a famous correspondence between the two famous mathematicians Blaise Pascal and Pierre de Fermat, sparked off by a gambling addict by the name of Antoine Gombaud, who went by the name of the “Chevalier de Méré” (although he wasn’t actually a nobleman of any sort). The Chevalier de Méré had played a lot of dice games in his time and, although he didn’t have a rigorous mathematical theory of how they worked, he nevertheless felt he had an intuitive  “feel” for what was a good bet and what wasn’t. In particular, he had done very well financially by betting at even money that he would roll at least one six in four rolls of a standard die.

It’s quite an easy matter to use the rules of probability to see why he was successful with this game. The odds  that a single roll of a fair die yields a six is 1/6. The probability that it does not yield a six is therefore 5/6. The probability that four independent rolls produce no sixes at all is (the probability that the first roll is not a six) times (the probability that the second roll is not a six) times (the probability that the third roll is not a six) times (the probability that the fourth roll is not a six). Each of the probabilities involved in this multiplication is 5/6, so the result is (5/6)4 which is 625/1296. But this is the probability of losing. The probability of winning is 1-625/1296 = 671/1296=0.5177, significantly higher than 50%. Sinceyou’re more likely to win than lose, it’s a good bet.

So successful had this game been for de Méré that nobody would bet against him any more, and he had to think of another bet to offer. Using his “feel” for the dice, he reckoned that betting on one or more double-six in twenty-four rolls of a pair of dice at even money should also be a winner. Unfortunately for him, he started to lose heavily on this game and in desperation wrote to his friend Pascal to ask why. This set Pascal wondering, and he in turn started a correspondence about it with Fermat.

This strange turn of events led not only to the beginnings of a general formulation of probability theory, but also to the binomial distribution and the beautiful mathematical construction now known as Pascal’s Triangle.

The full story of this is recounted in the fascinating book shown above, but the immediate upshot for de Méré was that he abandoned this particular game.

To see why, just consider each throw of a pair of dice as a single “event”. There are 36 possible events corresponding to six possible outcomes on each of the dice (6×6=36). The probability of getting a double six in such an event is 1/36 because only one of the 36 events corresponds to two sixes. The probability of not getting a double six is therefore 35/36. The probability that a set of 24 independent fair throws of a pair of dice produces no double-sixes at all is therefore 35/36 multiplied by itself 24 times, or (35/36)24. This is 0.5086, which is slightly higher than 50%. The probability that at least one double-six occurs is therefore 1-0.5086, or 0.4914. Our Chevalier has a less than 50% chance of winning, so an even money bet is not a good idea, unless he plans to use this scheme as a tax dodge.

Both Fermat and Pascal had made important contributions to many diverse aspects of scientific thought in addition to pure mathematics, including physics, the first real astronomer to contribute to the development of probability in the context of gambling was Christiaan Huygens, the man who discovered the rings of Saturn in 1655. Two years after his famous astronomical discovery, he published a book called Calculating in Games of Chance, which introduced the concept of expectation. However, the development of the statistical theory underlying  games and gambling came  with the publication in 1713 of Jakob Bernouilli’s wonderful treatise entitled Ars Conjectandi which did a great deal to establish the general mathematical theory of probability and statistics.

The Inductive Detective

Posted in Bad Statistics, Literature, The Universe and Stuff with tags , , , , , , , on September 4, 2009 by telescoper

I was watching an old episode of Sherlock Holmes last night – from the classic  Granada TV series featuring Jeremy Brett’s brilliant (and splendidly camp) portrayal of the eponymous detective. One of the  things that fascinates me about these and other detective stories is how often they use the word “deduction” to describe the logical methods involved in solving a crime.

As a matter of fact, what Holmes generally uses is not really deduction at all, but inference (a process which is predominantly inductive).

In deductive reasoning, one tries to tease out the logical consequences of a premise; the resulting conclusions are, generally speaking, more specific than the premise. “If these are the general rules, what are the consequences for this particular situation?” is the kind of question one can answer using deduction.

The kind of reasoning of reasoning Holmes employs, however, is essentially opposite to this. The  question being answered is of the form: “From a particular set of observations, what can we infer about the more general circumstances that relating to them?”. The following example from a Study in Scarlet is exactly of this type:

From a drop of water a logician could infer the possibility of an Atlantic or a Niagara without having seen or heard of one or the other.

The word “possibility” makes it clear that no certainty is attached to the actual existence of either the Atlantic or Niagara, but the implication is that observations of (and perhaps experiments on) a single water drop could allow one to infer sufficient of the general properties of water in order to use them to deduce the possible existence of other phenomena. The fundamental process is inductive rather than deductive, although deductions do play a role once general rules have been established.

In the example quoted there is  an inductive step between the water drop and the general physical and chemical properties of water and then a deductive step that shows that these laws could describe the Atlantic Ocean. Deduction involves going from theoretical axioms to observations whereas induction  is the reverse process.

I’m probably labouring this distinction, but the main point of doing so is that a great deal of science is fundamentally inferential and, as a consequence, it entails dealing with inferences (or guesses or conjectures) that are inherently uncertain as to their application to real facts. Dealing with these uncertain aspects requires a more general kind of logic than the  simple Boolean form employed in deductive reasoning. This side of the scientific method is sadly neglected in most approaches to science education.

In physics, the attitude is usually to establish the rules (“the laws of physics”) as axioms (though perhaps giving some experimental justification). Students are then taught to solve problems which generally involve working out particular consequences of these laws. This is all deductive. I’ve got nothing against this as it is what a great deal of theoretical research in physics is actually like, it forms an essential part of the training of an physicist.

However, one of the aims of physics – especially fundamental physics – is to try to establish what the laws of nature actually are from observations of particular outcomes. It would be simplistic to say that this was entirely inductive in character. Sometimes deduction plays an important role in scientific discoveries. For example,  Albert Einstein deduced his Special Theory of Relativity from a postulate that the speed of light was constant for all observers in uniform relative motion. However, the motivation for this entire chain of reasoning arose from previous studies of eletromagnetism which involved a complicated interplay between experiment and theory that eventually led to Maxwell’s equations. Deduction and induction are both involved at some level in a kind of dialectical relationship.

The synthesis of the two approaches requires an evaluation of the evidence the data provides concerning the different theories. This evidence is rarely conclusive, so  a wider range of logical possibilities than “true” or “false” needs to be accommodated. Fortunately, there is a quantitative and logically rigorous way of doing this. It is called Bayesian probability. In this way of reasoning,  the probability (a number between 0 and 1 attached to a hypothesis, model, or anything that can be described as a logical proposition of some sort) represents the extent to which a given set of data supports the given hypothesis.  The calculus of probabilities only reduces to Boolean algebra when the probabilities of all hypothesese involved are either unity (certainly true) or zero (certainly false). In between “true” and “false” there are varying degrees of “uncertain” represented by a number between 0 and 1, i.e. the probability.

Overlooking the importance of inductive reasoning has led to numerous pathological developments that have hindered the growth of science. One example is the widespread and remarkably naive devotion that many scientists have towards the philosophy of the anti-inductivist Karl Popper; his doctrine of falsifiability has led to an unhealthy neglect of  an essential fact of probabilistic reasoning, namely that data can make theories more probable. More generally, the rise of the empiricist philosophical tradition that stems from David Hume (another anti-inductivist) spawned the frequentist conception of probability, with its regrettable legacy of confusion and irrationality.

My own field of cosmology provides the largest-scale illustration of this process in action. Theorists make postulates about the contents of the Universe and the laws that describe it and try to calculate what measurable consequences their ideas might have. Observers make measurements as best they can, but these are inevitably restricted in number and accuracy by technical considerations. Over the years, theoretical cosmologists deductively explored the possible ways Einstein’s General Theory of Relativity could be applied to the cosmos at large. Eventually a family of theoretical models was constructed, each of which could, in principle, describe a universe with the same basic properties as ours. But determining which, if any, of these models applied to the real thing required more detailed data.  For example, observations of the properties of individual galaxies led to the inferred presence of cosmologically important quantities of  dark matter. Inference also played a key role in establishing the existence of dark energy as a major part of the overall energy budget of the Universe. The result is now that we have now arrived at a standard model of cosmology which accounts pretty well for most relevant data.

Nothing is certain, of course, and this model may well turn out to be flawed in important ways. All the best detective stories have twists in which the favoured theory turns out to be wrong. But although the puzzle isn’t exactly solved, we’ve got good reasons for thinking we’re nearer to at least some of the answers than we were 20 years ago.

I think Sherlock Holmes would have approved.

Simpson’s Paradox

Posted in Bad Statistics with tags , , on August 30, 2009 by telescoper

 I haven’t put anything in the Bad Statistics  file for a while, so I thought I’d put this interesting little example up for your perusal.

Although my own field of modern cosmology requires a great deal of complicated statistical reasoning, cosmologists have it relatively easy because there is not much chance that any errors we make will actually end up harming anyone. Speculations about the Anthropic Principle or Theories of Everything are sometimes  reported in the mass media but, if they are, and are garbled, the resulting confusion is unlikely to be fatal. The same can not be said of the field of medical statistics. I can think of scores of examples where poor statistical reasoning has been responsible for shambles in the domain of public health.

Here’s an example of how a relatively simple statistical test can lead to total confusion. In this version, it is known as Simpson’s Paradox.

 A standard thing to do in a medical trial is to take a set of patients suffering from some condition and divide them into two groups. One group is given a treatment (T) and the other group is given a placebo; this latter group is called the control and I will denote it T* (no treatment).

To make things specific suppose we have 100 patients, of whom 50 are actively treated and 50 form the control.  Suppose that at the end of the trial for the treatment, patients can be classified as recovered (“R”) or not recovered (“R*”).  Consider the following outcome, displayed in a contingency table:

 

  R R* Total Recovery
T 20 30 50 40%
T* 16 34 50 32%
Totals 36 64 100  

 

 Clearly the recovery rate for those actively treated (40%) exceeds that for the control group, so the treatment seems at first sight to produce some benefit.

 Now let us divide the group into older and younger patients: the young group Y contains those under 50 years old (carefully defined so that I would belong to it) and Y* is those over 50.

 The following results are obtained for the young patients.

 

  R R* Total Recovery
T 19 21 40 47.5%
T* 5 5 10 50%
Totals 24 26 50  

The older group returns the following data: 

  R R* Total Recovery
T 1 9 10 10%
T* 11 29 40 27.5%
Totals 12 38 50  

 For each of the two groups separately, the recovery rate for the control exceeds that of the treated patients. The placebo works better than the treatment for the young and the old separately, but for the population as a whole the treatment seems to work better than the placebo!

This seems very confusing, and just think how many medical reports in newspapers contain results of this type: drinking red wine is good for you, eating meat is bad for you, and so on. What has gone wrong?

 The key to this paradox is to note that many more of the younger patients are actually in the treatment group than in the non-treatment group, while the situation is reversed for the older patients. The result is to confuse the effect of the treatment with a perfectly possible dependence of recovery on the age of the recipient. In essence this is a badly designed trial, but there is no doubting that it is a subtle effect and not one that most people could understand without a great deal of careful explanation which it is unlikely to get in the pages of a newspaper.

A Mountain of Truth

Posted in Bad Statistics, The Universe and Stuff with tags , , , , on August 1, 2009 by telescoper

I spent the last week at a conference in a beautiful setting amidst the hills overlooking the small town of Ascona by Lake Maggiore in the canton of Ticino, the Italian-speaking part of Switzerland. To be more precise we were located in a conference centre called the Centro Stefano Franscini on  Monte Verità. The meeting was COSMOSTATS which aimed

… to bring together world-class leading figures in cosmology and particle physics, as well as renowned statisticians, in order to exchange knowledge and experience in dealing with large and complex data sets, and to meet the challenge of upcoming large cosmological surveys.

Although I didn’t know much about the location beforehand it turns out to have an extremely interesting history, going back about a hundred years. The first people to settle there, around the end of the 19th Century,  were anarchists who had sought refuge there during times of political upheaval. The Locarno region had long been a popular place for people with “alternative” lifestyles. Monte Verità (“The Mountain of Truth”) was eventually bought by Henri Oedenkoven, the son of a rich industrialist, and he  set up a sort of commune there at  which the residents practised vegetarianism, naturism, free love  and other forms of behaviour that were intended as a reaction against the scientific and technological progress of the time.  From about 1904 onward the centre became a sanatorium where the discipline of psychoanalysis flourished and it later attracted many artists. In 1927,   Baron Eduard Von dey Heydt took the place over. He was a great connoisseur of Oriental philosophy and art collector and he established  a large collection at Monte Verità, much of which is still there because when the Baron died in 1956 he left Monte Verità to the local Canton.

Given the bizarre collection of anarchists, naturists, theosophists (and even vegetarians) that used to live in Monte Verità, it is by no means out of keeping with the tradition that it should eventually play host to a conference of cosmologists and statisticians.

The  conference itself was interesting, and I was lucky enough to get to chair a session with three particularly interesting talks in it. In general, though, these dialogues between statisticians and physicists don’t seem to be as productive as one might have hoped. I’ve been to a few now, and although there’s a lot of enjoyable polemic they don’t work too well at changing anyone’s opinion or providing new insights.

We may now have mountains of new data in cosmology in particle physics but that hasn’t always translated into a corresponding mountain of truth. Intervening between our theories and observations lies the vexed question of how best to analyse the data and what the results actually mean. As always, lurking in the background, was the long-running conflict between adherents of the Bayesian and frequentist interpretations of probability. It appears that cosmologists -at least those represented at this meeting – tend to be Bayesian while particle physicists are almost exclusively frequentist. I’ll refrain from commenting on what this might mean. However, I was perplexed by various comments made during the conference about the issue of coverage. which is discussed rather nicely in some detail here. To me the question of of whether a Bayesian method has good frequentist coverage properties  is completely irrelevant. Bayesian methods ask different questions (actually, ones to which scientists want to know the answer) so it is not surprising that they give different answers. Measuring a Bayesian method according to  a frequentist criterion is completely pointless whichever camp you belong to.

The irrelevance of coverage was one thing that the previous residents knew better than some of the conference guests:

mvtanz3

I’d like to thank  Uros Seljak, Roberto Trotta and Martin Kunz for organizing the meeting in such a  picturesque and intriguing place.

First Digits and Electoral Fraud in Iran

Posted in Bad Statistics with tags , , on June 22, 2009 by telescoper

An interesting issue has arisen recently about the possibility that the counting of the recent hotly contested Iranian election results might have been fraudulent. I mention it here because it involves  Benford’s Law – otherwise known as the First Digit Phenomenon – which I’ve blogged about before.

Apparently what started this off was a post on the ArXiv by the cosmologist Boudewijn Roukema, but I first heard about it myself via a pingback from another wordpress blog.  The same blogger has written a subsequent analysis here.

I’m not going to go into this in more detail here: the others involved have an enormous headstart and in any case I wouldn’t want to try to steal their thunder.  Suffice to say that there is at least a suspicion that the distribution of first digits in the published results is more uniform than would be expected by chance, given the that the general behaviour under Benford’s Law is to have more digits beginning with the digit “1” than any other. This apparently paradoxical result is quite easily explained. It also provides a way to check for fraud in, for example, tax returns.  How it applies to election results is, however, not so clear and the analysis is a bit controversial.

I’m sure some of you out there will have time to look at this in more detail so I encourage you to do so…

 

Oh. The story is gathering momentum elsewhere too. See here.