Archive for the Bad Statistics Category

The League of Extraordinary Gibberish

Posted in Bad Statistics with tags , , , on October 13, 2009 by telescoper

After a very busy few days I thought I’d relax yesterday by catching up with a bit of reading. In last week’s Times Higher I found there was a supplement giving this year’s World University Rankings.

I don’t really approve of league tables but somehow can’t resist looking in them to see where my current employer Cardiff University lies. There we are at number 135 in the list of the top 200 Universities. That’s actually not bad for an institute that’s struggling with a Welsh funding  system that seriously disadvantages it compared to our English colleagues. We’re a long way down compared to Cambridge (2nd), UCL (4th), Imperial and Oxford (5th=) . Compared to places I’ve worked at previously we’re significantly below Nottingham (91st) but still above Queen Mary (164) and Sussex (166). Number 1 in the world is Harvard, which is apparently somewhere near Boston (the American one).

Relieved that we’re in the top 200 at all, I decided to have a look at how the tables were drawn up. I wish I hadn’t bothered because I was horrified at the methodological garbage that lies behind it. You can find a full account of the travesty here. In essence, however, the ranking is arrived at by adding six distinct indicators, weighted differently but with weights assigned for no obvious reason, each of which is arrived at by dubious means and which is highly unlikely to mean what it purports. Each indicator is magically turned into a score out of 100 before being added to all the other ones (with appropriate weighting factors).

The indicators are:

  1. Academic Peer Review. This is weighted 40% of the overall score for each institution and is obtained by asking a sample of academics (selected in a way that is not explained). This year 9386 people were involved; they were asked to name institutions they regard as the best in their field. This sample is a tiny fraction of the global academic population and it would amaze me if it were representative of anything at all!
  2. Employer Survey. The pollsters asked 3281 graduate employers for their opinions of the different universities. This was weighted 10%.
  3. Staff-Student Ratio. Counting 20%, this is supposed to be a measure of “teaching quality”! Good teaching = large numbers of staff? Not if most of them don’t teach as at many research universities. A large staff-student ratio could even mean the place is really unpopular!
  4. International Faculty. This measures the  proportion of overseas staff on the books. Apparently a large number of foreign lecturers makes for a good university and “how attractive an institution is around the world”. Or perhaps that it finds it difficult to recruit its own nationals. This one counts only 5%.
  5. International Students. Another 5% goes to the fraction of each of the student body that is from overseas.
  6. Research Excellence. This is measured solely on the basis of citations – I’ve discussed some of the issues with that before – and counts 20%. They choose to use an unreliable database called SCOPUS, run by the profiteering academic publisher Elsevier. The total number of citations is divided by the number of faculty to “give a sense of the density of research excellence” at the institution.

Well I hope by now you’ve got a sense of the density of the idiots who compiled this farrago. Even if you set aside the issue of the accuracy of the input data, there is still the issue of how on Earth anyone could have thought it was sensible to pick such silly ways of measuring what makes a good university, assigning random weights to them, and then claiming that they had achieved something useful. They probably got paid a lot for doing it too. Talk about money for old rope. I’m in the wrong business.

What gives the game away entirely is the enormous variance from indicator to another. This means that changing the weights slightly would produce a drastically different list. And who is to say that the variables should be added linearly anyway? Is a score of 100 really worth precisely twice as much as a score of 50? What do the distributions look like? How significant are the differences in score from one institute to another? And what are we actually trying to measure anyway?

Here’s an example. The University of California at Berkeley scores 100/100 for 1,2 and 4 and 86 for 5. However for Staff/Student ratio (3) it gets a lowly 25/100 and for (6) it gets only 34, which combine take it down to 39th in the table. Exclude this curiously-chosen proxy for teaching quality and Berkeley would rocket up the table.

Of course you can laugh these things off as unimportant trivia to be looked at with mild amusement over a glass of wine, but such things have increasingly found their way into the minds of managers and politicians. The fact that they are based on flawed assumptions, use a daft methodology, and produce utterly meaningless results seems to be irrelevant. Because they are based on numbers they must represent some kind of absolute truth.

There’s nothing at all wrong with collating and publishing information about schools and universities. Such facts should be available to the public. What is wrong is the manic obsession with  condensing disparate sets of conflicting data into a single number just so things can be ordered in lists that politicians can understand.

You can see the same thing going on in the national newspapers’ lists of University rankings. Each one uses a different weighting and different data and the lists are drastically different. They give different answers because nobody has even bothered to think about what the question is.

The Law of Unreason

Posted in Bad Statistics, The Universe and Stuff with tags , , , , on October 11, 2009 by telescoper

Not much time to post today, so I thought I’d just put up a couple of nice little quotes about the Central Limit Theorem. In case you don’t know it, this theorem explains why so many phenomena result in measurable things whose frequencies of occurrence can be described by the Normal (Gaussian) distribution, with its characteristic Bell-shaped curve. I’ve already mentioned the role that various astronomers played in the development of this bit of mathematics, so I won’t repeat the story in this post.

In fact I was asked to prove the theorem during my PhD viva, and struggled to remember how to do it, but it’s such an important thing that it was quite reasonable for my examiners  to ask the question and quite reasonable for them to have expected me to answer it! If you want to know how to do it, then I’ll give you a hint: it involves a Fourier transform!

Any of you who took a peep at Joan Magueijo’s lecture that I posted about yesterday will know that the title of his talk was Anarchy and Physical Laws. The main issue he addressed was whether the existence of laws of physics requires that the Universe must have been designed or whether mathematical regularities could somehow emerge from a state of lawlessness. Why the Universe is lawful is of course one of the greatest mysteries of all, and one that, for some at least, transcends science and crosses over into the realm of theology.

In my little address at the end of Joao’s talk I drew an analogy with the Central Limit Theorem which is an example of an emergent mathematical law that describes situations which are apparently extremely chaotic. I just wanted to make the point that there are well-known examples of such things, even if the audience were sceptical about applying such notions to the entire Universe.

The quotation I picked was this one from Sir Francis Galton:

I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the “Law of Frequency of Error”. The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement, amidst the wildest confusion. The huger the mob, and the greater the apparent anarchy, the more perfect is its sway. It is the supreme law of Unreason. Whenever a large sample of chaotic elements are taken in hand and marshalled in the order of their magnitude, an unsuspected and most beautiful form of regularity proves to have been latent all along

However, it is worth remembering also that not everything has a normal distribution: the central limit theorem requires linear, additive behaviour of the variables involved. I posted about an example where this is not the case here. Theorists love to make the Gaussian assumption when dealing with phenomena that they want to model with stochastic processes because these make many calculations tractable that otherwise would be too difficult. In cosmology, for example, we usually assume that the primordial density perturbations that seeded the formation of large-scale structure obeyed Gaussian statistics. Observers and experimentalists frequently assume Gaussian measurement errors in order to apply off-the-shelf statistical methods to their results. Often nature is kind to us but every now and again we find anomalies that are inconsistent with the normal distribution. Those exceptions usually lead to clues that something interesting is going on that violates the terms of the Central Limit Theorem. There are inklings that this may be the case in cosmology.

So to balance Galton’s remarks, I add this quote by Gabriel Lippmann which I’ve taken the liberty of translating from the original French.

Everyone believes in the [normal] law of errors: the mathematicians, because they think it is an experimental fact; and the experimenters, because they suppose it is a theorem of mathematics

There are more things in heaven and earth than are described by the Gaussian distribution!

Astrostats

Posted in Bad Statistics, The Universe and Stuff with tags , , , , , , , , , on September 20, 2009 by telescoper

A few weeks ago I posted an item on the theme of how gambling games were good for the development of probability theory. That piece  contained a mention of one astronomer (Christiaan Huygens), but I wanted to take the story on a little bit to make the historical connection between astronomy and statistics more explicit.

Once the basics of mathematical probability had been worked out, it became possible to think about applying probabilistic notions to problems in natural philosophy. Not surprisingly, many of these problems were of astronomical origin but, on the way, the astronomers that tackled them also derived some of the basic concepts of statistical theory and practice. Statistics wasn’t just something that astronomers took off the shelf and used; they made fundamental contributions to the development of the subject itself.

The modern subject we now know as physics really began in the 16th and 17th century, although at that time it was usually called Natural Philosophy. The greatest early work in theoretical physics was undoubtedly Newton’s great Principia, published in 1687, which presented his idea of universal gravitation which, together with his famous three laws of motion, enabled him to account for the orbits of the planets around the Sun. But majestic though Newton’s achievements undoubtedly were, I think it is fair to say that the originator of modern physics was Galileo Galilei.

Galileo wasn’t as much of a mathematical genius as Newton, but he was highly imaginative, versatile and (very much unlike Newton) had an outgoing personality. He was also an able musician, fine artist and talented writer: in other words a true Renaissance man.  His fame as a scientist largely depends on discoveries he made with the telescope. In particular, in 1610 he observed the four largest satellites of Jupiter, the phases of Venus and sunspots. He immediately leapt to the conclusion that not everything in the sky could be orbiting the Earth and openly promoted the Copernican view that the Sun was at the centre of the solar system with the planets orbiting around it. The Catholic Church was resistant to these ideas. He was hauled up in front of the Inquisition and placed under house arrest. He died in the year Newton was born (1642).

These aspects of Galileo’s life are probably familiar to most readers, but hidden away among scientific manuscripts and notebooks is an important first step towards a systematic method of statistical data analysis. Galileo performed numerous experiments, though he certainly carry out the one with which he is most commonly credited. He did establish that the speed at which bodies fall is independent of their weight, not by dropping things off the leaning tower of Pisa but by rolling balls down inclined slopes. In the course of his numerous forays into experimental physics Galileo realised that however careful he was taking measurements, the simplicity of the equipment available to him left him with quite large uncertainties in some of the results. He was able to estimate the accuracy of his measurements using repeated trials and sometimes ended up with a situation in which some measurements had larger estimated errors than others. This is a common occurrence in many kinds of experiment to this day.

Very often the problem we have in front of us is to measure two variables in an experiment, say X and Y. It doesn’t really matter what these two things are, except that X is assumed to be something one can control or measure easily and Y is whatever it is the experiment is supposed to yield information about. In order to establish whether there is a relationship between X and Y one can imagine a series of experiments where X is systematically varied and the resulting Y measured.  The pairs of (X,Y) values can then be plotted on a graph like the example shown in the Figure.

XY

In this example on it certainly looks like there is a straight line linking Y and X, but with small deviations above and below the line caused by the errors in measurement of Y. This. You could quite easily take a ruler and draw a line of “best fit” by eye through these measurements. I spent many a tedious afternoon in the physics labs doing this sort of thing when I was at school. Ideally, though, what one wants is some procedure for fitting a mathematical function to a set of data automatically, without requiring any subjective intervention or artistic skill. Galileo found a way to do this. Imagine you have a set of pairs of measurements (xi,yi) to which you would like to fit a straight line of the form y=mx+c. One way to do it is to find the line that minimizes some measure of the spread of the measured values around the theoretical line. The way Galileo did this was to work out the sum of the differences between the measured yi and the predicted values mx+c at the measured values x=xi. He used the absolute difference |yi-(mxi+c)| so that the resulting optimal line would, roughly speaking, have as many of the measured points above it as below it. This general idea is now part of the standard practice of data analysis, and as far as I am aware, Galileo was the first scientist to grapple with the problem of dealing properly with experimental error.

error

The method used by Galileo was not quite the best way to crack the puzzle, but he had it almost right. It was again an astronomer who provided the missing piece and gave us essentially the same method used by statisticians (and astronomy) today.

Karl Friedrich Gauss was undoubtedly one of the greatest mathematicians of all time, so it might be objected that he wasn’t really an astronomer. Nevertheless he was director of the Observatory at Göttingen for most of his working life and was a keen observer and experimentalist. In 1809, he developed Galileo’s ideas into the method of least-squares, which is still used today for curve fitting.

This approach involves basically the same procedure but involves minimizing the sum of [yi-(mxi+c)]2 rather than |yi-(mxi+c)|. This leads to a much more elegant mathematical treatment of the resulting deviations – the “residuals”.  Gauss also did fundamental work on the mathematical theory of errors in general. The normal distribution is often called the Gaussian curve in his honour.

After Galileo, the development of statistics as a means of data analysis in natural philosophy was dominated by astronomers. I can’t possibly go systematically through all the significant contributors, but I think it is worth devoting a paragraph or two to a few famous names.

I’ve already mentioned Jakob Bernoulli, whose famous book on probability was probably written during the 1690s. But Jakob was just one member of an extraordinary Swiss family that produced at least 11 important figures in the history of mathematics.  Among them was Daniel Bernoulli who was born in 1700.  Along with the other members of his famous family, he had interests that ranged from astronomy to zoology. He is perhaps most famous for his work on fluid flows which forms the basis of much of modern hydrodynamics, especially Bernouilli’s principle, which accounts for changes in pressure as a gas or liquid flows along a pipe of varying width.
But the elder Jakob’s work on gambling clearly also had some effect on Daniel, as in 1735 the younger Bernoulli published an exceptionally clever study involving the application of probability theory to astronomy. It had been known for centuries that the orbits of the planets are confined to the same part in the sky as seen from Earth, a narrow band called the Zodiac. This is because the Earth and the planets orbit in approximately the same plane around the Sun. The Sun’s path in the sky as the Earth revolves also follows the Zodiac. We now know that the flattened shape of the Solar System holds clues to the processes by which it formed from a rotating cloud of cosmic debris that formed a disk from which the planets eventually condensed, but this idea was not well established in the time of Daniel Bernouilli. He set himself the challenge of figuring out what the chance was that the planets were orbiting in the same plane simply by chance, rather than because some physical processes confined them to the plane of a protoplanetary disk. His conclusion? The odds against the inclinations of the planetary orbits being aligned by chance were, well, astronomical.

The next “famous” figure I want to mention is not at all as famous as he should be. John Michell was a Cambridge graduate in divinity who became a village rector near Leeds. His most important idea was the suggestion he made in 1783 that sufficiently massive stars could generate such a strong gravitational pull that light would be unable to escape from them.  These objects are now known as black holes (although the name was coined much later by John Archibald Wheeler). In the context of this story, however, he deserves recognition for his use of a statistical argument that the number of close pairs of stars seen in the sky could not arise by chance. He argued that they had to be physically associated, not fortuitous alignments. Michell is therefore credited with the discovery of double stars (or binaries), although compelling observational confirmation had to wait until William Herschel’s work of 1803.

It is impossible to overestimate the importance of the role played by Pierre Simon, Marquis de Laplace in the development of statistical theory. His book A Philosophical Essay on Probabilities, which began as an introduction to a much longer and more mathematical work, is probably the first time that a complete framework for the calculation and interpretation of probabilities ever appeared in print. First published in 1814, it is astonishingly modern in outlook.

Laplace began his scientific career as an assistant to Antoine Laurent Lavoiser, one of the founding fathers of chemistry. Laplace’s most important work was in astronomy, specifically in celestial mechanics, which involves explaining the motions of the heavenly bodies using the mathematical theory of dynamics. In 1796 he proposed the theory that the planets were formed from a rotating disk of gas and dust, which is in accord with the earlier assertion by Daniel Bernouilli that the planetary orbits could not be randomly oriented. In 1776 Laplace had also figured out a way of determining the average inclination of the planetary orbits.

A clutch of astronomers, including Laplace, also played important roles in the establishment of the Gaussian or normal distribution.  I have also mentioned Gauss’s own part in this story, but other famous astronomers played their part. The importance of the Gaussian distribution owes a great deal to a mathematical property called the Central Limit Theorem: the distribution of the sum of a large number of independent variables tends to have the Gaussian form. Laplace in 1810 proved a special case of this theorem, and Gauss himself also discussed it at length.

A general proof of the Central Limit Theorem was finally furnished in 1838 by another astronomer, Friedrich Wilhelm Bessel– best known to physicists for the functions named after him – who in the same year was also the first man to measure a star’s distance using the method of parallax. Finally, the name “normal” distribution was coined in 1850 by another astronomer, John Herschel, son of William Herschel.

I hope this gets the message across that the histories of statistics and astronomy are very much linked. Aspiring young astronomers are often dismayed when they enter research by the fact that they need to do a lot of statistical things. I’ve often complained that physics and astronomy education at universities usually includes almost nothing about statistics, because that is the one thing you can guarantee to use as a researcher in practically any branch of the subject.

Over the years, statistics has become regarded as slightly disreputable by many physicists, perhaps echoing Rutherford’s comment along the lines of “If your experiment needs statistics, you ought to have done a better experiment”. That’s a silly statement anyway because all experiments have some form of error that must be treated statistically, but it is particularly inapplicable to astronomy which is not experimental but observational. Astronomers need to do statistics, and we owe it to the memory of all the great scientists I mentioned above to do our statistics properly.

Game Theory

Posted in Bad Statistics, Books, Talks and Reviews, The Universe and Stuff with tags , , , on September 5, 2009 by telescoper

Nowadays gambling is generally looked down on as something shady and disreputable, not to be discussed in polite company, or even to be banned altogether. However, the  formulation of the basic laws of probability was almost exclusively inspired by their potential application to games of chance. Once established, these laws found a much wide range of applications in scientific contexts, including my own field of astronomy. I thought I’d illustrate this connection with a couple of examples. You may think that I’m just trying to make excuses for the fact that I also enjoy the odd bet every now and then!

Gambling in various forms has been around for millennia. Sumerian and Assyrian archaeological sites are littered with examples of a certain type of bone, called the astragalus (or talus bone). This is found just above the heel and its shape (in sheep and deer at any rate) is such that when it is tossed in the air it can land in any one of four possible orientations. It can therefore be used to generate “random” outcomes and is in many ways the forerunner of modern six-sided dice. The astragalus is known to have been used for gambling games as early as 3600 BC.

images

Unlike modern dice, which appeared around 2000BC, the astragalus is not symmetrical, giving a different probability of it landing in each orientation. It is not thought that there was a mathematical understanding of how to calculate odds in games involving this object or its more symmetrical successors.

Games of chance also appear to have been commonplace in the time of Christ – Roman soldiers are supposed to have drawn lots at the crucifixion, for example – but there is no evidence of any really formalised understanding of the laws of probability at this time.

Playing cards emerged in China sometime during the tenth century BC and were available in western europe by the 14th Century. This is an interesting development because playing cards can be used for games such as contract Bridge which involve a great deal of pure skill as well as an element of randomness. Perhaps it is this aspect that finally got serious intellectuals (i.e. physicists) excited about probability theory.

The first book on probability that I am aware of was by Gerolamo Cardano. His Liber de Ludo Aleae ( Book on Games of Chance) was published in 1663, but it was written more than a century earlier than this date.  Probability theory really got going in 1654 with a famous correspondence between the two famous mathematicians Blaise Pascal and Pierre de Fermat, sparked off by a gambling addict by the name of Antoine Gombaud, who went by the name of the “Chevalier de Méré” (although he wasn’t actually a nobleman of any sort). The Chevalier de Méré had played a lot of dice games in his time and, although he didn’t have a rigorous mathematical theory of how they worked, he nevertheless felt he had an intuitive  “feel” for what was a good bet and what wasn’t. In particular, he had done very well financially by betting at even money that he would roll at least one six in four rolls of a standard die.

It’s quite an easy matter to use the rules of probability to see why he was successful with this game. The odds  that a single roll of a fair die yields a six is 1/6. The probability that it does not yield a six is therefore 5/6. The probability that four independent rolls produce no sixes at all is (the probability that the first roll is not a six) times (the probability that the second roll is not a six) times (the probability that the third roll is not a six) times (the probability that the fourth roll is not a six). Each of the probabilities involved in this multiplication is 5/6, so the result is (5/6)4 which is 625/1296. But this is the probability of losing. The probability of winning is 1-625/1296 = 671/1296=0.5177, significantly higher than 50%. Sinceyou’re more likely to win than lose, it’s a good bet.

So successful had this game been for de Méré that nobody would bet against him any more, and he had to think of another bet to offer. Using his “feel” for the dice, he reckoned that betting on one or more double-six in twenty-four rolls of a pair of dice at even money should also be a winner. Unfortunately for him, he started to lose heavily on this game and in desperation wrote to his friend Pascal to ask why. This set Pascal wondering, and he in turn started a correspondence about it with Fermat.

This strange turn of events led not only to the beginnings of a general formulation of probability theory, but also to the binomial distribution and the beautiful mathematical construction now known as Pascal’s Triangle.

The full story of this is recounted in the fascinating book shown above, but the immediate upshot for de Méré was that he abandoned this particular game.

To see why, just consider each throw of a pair of dice as a single “event”. There are 36 possible events corresponding to six possible outcomes on each of the dice (6×6=36). The probability of getting a double six in such an event is 1/36 because only one of the 36 events corresponds to two sixes. The probability of not getting a double six is therefore 35/36. The probability that a set of 24 independent fair throws of a pair of dice produces no double-sixes at all is therefore 35/36 multiplied by itself 24 times, or (35/36)24. This is 0.5086, which is slightly higher than 50%. The probability that at least one double-six occurs is therefore 1-0.5086, or 0.4914. Our Chevalier has a less than 50% chance of winning, so an even money bet is not a good idea, unless he plans to use this scheme as a tax dodge.

Both Fermat and Pascal had made important contributions to many diverse aspects of scientific thought in addition to pure mathematics, including physics, the first real astronomer to contribute to the development of probability in the context of gambling was Christiaan Huygens, the man who discovered the rings of Saturn in 1655. Two years after his famous astronomical discovery, he published a book called Calculating in Games of Chance, which introduced the concept of expectation. However, the development of the statistical theory underlying  games and gambling came  with the publication in 1713 of Jakob Bernouilli’s wonderful treatise entitled Ars Conjectandi which did a great deal to establish the general mathematical theory of probability and statistics.

The Inductive Detective

Posted in Bad Statistics, Literature, The Universe and Stuff with tags , , , , , , , on September 4, 2009 by telescoper

I was watching an old episode of Sherlock Holmes last night – from the classic  Granada TV series featuring Jeremy Brett’s brilliant (and splendidly camp) portrayal of the eponymous detective. One of the  things that fascinates me about these and other detective stories is how often they use the word “deduction” to describe the logical methods involved in solving a crime.

As a matter of fact, what Holmes generally uses is not really deduction at all, but inference (a process which is predominantly inductive).

In deductive reasoning, one tries to tease out the logical consequences of a premise; the resulting conclusions are, generally speaking, more specific than the premise. “If these are the general rules, what are the consequences for this particular situation?” is the kind of question one can answer using deduction.

The kind of reasoning of reasoning Holmes employs, however, is essentially opposite to this. The  question being answered is of the form: “From a particular set of observations, what can we infer about the more general circumstances that relating to them?”. The following example from a Study in Scarlet is exactly of this type:

From a drop of water a logician could infer the possibility of an Atlantic or a Niagara without having seen or heard of one or the other.

The word “possibility” makes it clear that no certainty is attached to the actual existence of either the Atlantic or Niagara, but the implication is that observations of (and perhaps experiments on) a single water drop could allow one to infer sufficient of the general properties of water in order to use them to deduce the possible existence of other phenomena. The fundamental process is inductive rather than deductive, although deductions do play a role once general rules have been established.

In the example quoted there is  an inductive step between the water drop and the general physical and chemical properties of water and then a deductive step that shows that these laws could describe the Atlantic Ocean. Deduction involves going from theoretical axioms to observations whereas induction  is the reverse process.

I’m probably labouring this distinction, but the main point of doing so is that a great deal of science is fundamentally inferential and, as a consequence, it entails dealing with inferences (or guesses or conjectures) that are inherently uncertain as to their application to real facts. Dealing with these uncertain aspects requires a more general kind of logic than the  simple Boolean form employed in deductive reasoning. This side of the scientific method is sadly neglected in most approaches to science education.

In physics, the attitude is usually to establish the rules (“the laws of physics”) as axioms (though perhaps giving some experimental justification). Students are then taught to solve problems which generally involve working out particular consequences of these laws. This is all deductive. I’ve got nothing against this as it is what a great deal of theoretical research in physics is actually like, it forms an essential part of the training of an physicist.

However, one of the aims of physics – especially fundamental physics – is to try to establish what the laws of nature actually are from observations of particular outcomes. It would be simplistic to say that this was entirely inductive in character. Sometimes deduction plays an important role in scientific discoveries. For example,  Albert Einstein deduced his Special Theory of Relativity from a postulate that the speed of light was constant for all observers in uniform relative motion. However, the motivation for this entire chain of reasoning arose from previous studies of eletromagnetism which involved a complicated interplay between experiment and theory that eventually led to Maxwell’s equations. Deduction and induction are both involved at some level in a kind of dialectical relationship.

The synthesis of the two approaches requires an evaluation of the evidence the data provides concerning the different theories. This evidence is rarely conclusive, so  a wider range of logical possibilities than “true” or “false” needs to be accommodated. Fortunately, there is a quantitative and logically rigorous way of doing this. It is called Bayesian probability. In this way of reasoning,  the probability (a number between 0 and 1 attached to a hypothesis, model, or anything that can be described as a logical proposition of some sort) represents the extent to which a given set of data supports the given hypothesis.  The calculus of probabilities only reduces to Boolean algebra when the probabilities of all hypothesese involved are either unity (certainly true) or zero (certainly false). In between “true” and “false” there are varying degrees of “uncertain” represented by a number between 0 and 1, i.e. the probability.

Overlooking the importance of inductive reasoning has led to numerous pathological developments that have hindered the growth of science. One example is the widespread and remarkably naive devotion that many scientists have towards the philosophy of the anti-inductivist Karl Popper; his doctrine of falsifiability has led to an unhealthy neglect of  an essential fact of probabilistic reasoning, namely that data can make theories more probable. More generally, the rise of the empiricist philosophical tradition that stems from David Hume (another anti-inductivist) spawned the frequentist conception of probability, with its regrettable legacy of confusion and irrationality.

My own field of cosmology provides the largest-scale illustration of this process in action. Theorists make postulates about the contents of the Universe and the laws that describe it and try to calculate what measurable consequences their ideas might have. Observers make measurements as best they can, but these are inevitably restricted in number and accuracy by technical considerations. Over the years, theoretical cosmologists deductively explored the possible ways Einstein’s General Theory of Relativity could be applied to the cosmos at large. Eventually a family of theoretical models was constructed, each of which could, in principle, describe a universe with the same basic properties as ours. But determining which, if any, of these models applied to the real thing required more detailed data.  For example, observations of the properties of individual galaxies led to the inferred presence of cosmologically important quantities of  dark matter. Inference also played a key role in establishing the existence of dark energy as a major part of the overall energy budget of the Universe. The result is now that we have now arrived at a standard model of cosmology which accounts pretty well for most relevant data.

Nothing is certain, of course, and this model may well turn out to be flawed in important ways. All the best detective stories have twists in which the favoured theory turns out to be wrong. But although the puzzle isn’t exactly solved, we’ve got good reasons for thinking we’re nearer to at least some of the answers than we were 20 years ago.

I think Sherlock Holmes would have approved.

Simpson’s Paradox

Posted in Bad Statistics with tags , , on August 30, 2009 by telescoper

 I haven’t put anything in the Bad Statistics  file for a while, so I thought I’d put this interesting little example up for your perusal.

Although my own field of modern cosmology requires a great deal of complicated statistical reasoning, cosmologists have it relatively easy because there is not much chance that any errors we make will actually end up harming anyone. Speculations about the Anthropic Principle or Theories of Everything are sometimes  reported in the mass media but, if they are, and are garbled, the resulting confusion is unlikely to be fatal. The same can not be said of the field of medical statistics. I can think of scores of examples where poor statistical reasoning has been responsible for shambles in the domain of public health.

Here’s an example of how a relatively simple statistical test can lead to total confusion. In this version, it is known as Simpson’s Paradox.

 A standard thing to do in a medical trial is to take a set of patients suffering from some condition and divide them into two groups. One group is given a treatment (T) and the other group is given a placebo; this latter group is called the control and I will denote it T* (no treatment).

To make things specific suppose we have 100 patients, of whom 50 are actively treated and 50 form the control.  Suppose that at the end of the trial for the treatment, patients can be classified as recovered (“R”) or not recovered (“R*”).  Consider the following outcome, displayed in a contingency table:

 

  R R* Total Recovery
T 20 30 50 40%
T* 16 34 50 32%
Totals 36 64 100  

 

 Clearly the recovery rate for those actively treated (40%) exceeds that for the control group, so the treatment seems at first sight to produce some benefit.

 Now let us divide the group into older and younger patients: the young group Y contains those under 50 years old (carefully defined so that I would belong to it) and Y* is those over 50.

 The following results are obtained for the young patients.

 

  R R* Total Recovery
T 19 21 40 47.5%
T* 5 5 10 50%
Totals 24 26 50  

The older group returns the following data: 

  R R* Total Recovery
T 1 9 10 10%
T* 11 29 40 27.5%
Totals 12 38 50  

 For each of the two groups separately, the recovery rate for the control exceeds that of the treated patients. The placebo works better than the treatment for the young and the old separately, but for the population as a whole the treatment seems to work better than the placebo!

This seems very confusing, and just think how many medical reports in newspapers contain results of this type: drinking red wine is good for you, eating meat is bad for you, and so on. What has gone wrong?

 The key to this paradox is to note that many more of the younger patients are actually in the treatment group than in the non-treatment group, while the situation is reversed for the older patients. The result is to confuse the effect of the treatment with a perfectly possible dependence of recovery on the age of the recipient. In essence this is a badly designed trial, but there is no doubting that it is a subtle effect and not one that most people could understand without a great deal of careful explanation which it is unlikely to get in the pages of a newspaper.

A Mountain of Truth

Posted in Bad Statistics, The Universe and Stuff with tags , , , , on August 1, 2009 by telescoper

I spent the last week at a conference in a beautiful setting amidst the hills overlooking the small town of Ascona by Lake Maggiore in the canton of Ticino, the Italian-speaking part of Switzerland. To be more precise we were located in a conference centre called the Centro Stefano Franscini on  Monte Verità. The meeting was COSMOSTATS which aimed

… to bring together world-class leading figures in cosmology and particle physics, as well as renowned statisticians, in order to exchange knowledge and experience in dealing with large and complex data sets, and to meet the challenge of upcoming large cosmological surveys.

Although I didn’t know much about the location beforehand it turns out to have an extremely interesting history, going back about a hundred years. The first people to settle there, around the end of the 19th Century,  were anarchists who had sought refuge there during times of political upheaval. The Locarno region had long been a popular place for people with “alternative” lifestyles. Monte Verità (“The Mountain of Truth”) was eventually bought by Henri Oedenkoven, the son of a rich industrialist, and he  set up a sort of commune there at  which the residents practised vegetarianism, naturism, free love  and other forms of behaviour that were intended as a reaction against the scientific and technological progress of the time.  From about 1904 onward the centre became a sanatorium where the discipline of psychoanalysis flourished and it later attracted many artists. In 1927,   Baron Eduard Von dey Heydt took the place over. He was a great connoisseur of Oriental philosophy and art collector and he established  a large collection at Monte Verità, much of which is still there because when the Baron died in 1956 he left Monte Verità to the local Canton.

Given the bizarre collection of anarchists, naturists, theosophists (and even vegetarians) that used to live in Monte Verità, it is by no means out of keeping with the tradition that it should eventually play host to a conference of cosmologists and statisticians.

The  conference itself was interesting, and I was lucky enough to get to chair a session with three particularly interesting talks in it. In general, though, these dialogues between statisticians and physicists don’t seem to be as productive as one might have hoped. I’ve been to a few now, and although there’s a lot of enjoyable polemic they don’t work too well at changing anyone’s opinion or providing new insights.

We may now have mountains of new data in cosmology in particle physics but that hasn’t always translated into a corresponding mountain of truth. Intervening between our theories and observations lies the vexed question of how best to analyse the data and what the results actually mean. As always, lurking in the background, was the long-running conflict between adherents of the Bayesian and frequentist interpretations of probability. It appears that cosmologists -at least those represented at this meeting – tend to be Bayesian while particle physicists are almost exclusively frequentist. I’ll refrain from commenting on what this might mean. However, I was perplexed by various comments made during the conference about the issue of coverage. which is discussed rather nicely in some detail here. To me the question of of whether a Bayesian method has good frequentist coverage properties  is completely irrelevant. Bayesian methods ask different questions (actually, ones to which scientists want to know the answer) so it is not surprising that they give different answers. Measuring a Bayesian method according to  a frequentist criterion is completely pointless whichever camp you belong to.

The irrelevance of coverage was one thing that the previous residents knew better than some of the conference guests:

mvtanz3

I’d like to thank  Uros Seljak, Roberto Trotta and Martin Kunz for organizing the meeting in such a  picturesque and intriguing place.

First Digits and Electoral Fraud in Iran

Posted in Bad Statistics with tags , , on June 22, 2009 by telescoper

An interesting issue has arisen recently about the possibility that the counting of the recent hotly contested Iranian election results might have been fraudulent. I mention it here because it involves  Benford’s Law – otherwise known as the First Digit Phenomenon – which I’ve blogged about before.

Apparently what started this off was a post on the ArXiv by the cosmologist Boudewijn Roukema, but I first heard about it myself via a pingback from another wordpress blog.  The same blogger has written a subsequent analysis here.

I’m not going to go into this in more detail here: the others involved have an enormous headstart and in any case I wouldn’t want to try to steal their thunder.  Suffice to say that there is at least a suspicion that the distribution of first digits in the published results is more uniform than would be expected by chance, given the that the general behaviour under Benford’s Law is to have more digits beginning with the digit “1” than any other. This apparently paradoxical result is quite easily explained. It also provides a way to check for fraud in, for example, tax returns.  How it applies to election results is, however, not so clear and the analysis is a bit controversial.

I’m sure some of you out there will have time to look at this in more detail so I encourage you to do so…

 

Oh. The story is gathering momentum elsewhere too. See here.

The Doomsday Argument

Posted in Bad Statistics, The Universe and Stuff with tags , , , , , on April 29, 2009 by telescoper

I don’t mind admitting that as I get older I get more and  more pessimistic about the prospects for humankind’s survival into the distant future.

Unless there are major changes in the way this planet is governed, our planet may become barren and uninhabitable through war or environmental catastrophe. But I do think the future is in our hands, and disaster is, at least in principle, avoidable. In this respect I have to distance myself from a very strange argument that has been circulating among philosophers and physicists for a number of years. It is called Doomsday argument, and it even has a sizeable wikipedia entry, to which I refer you for more details and variations on the basic theme. As far as I am aware, it was first introduced by the mathematical physicist Brandon Carter and subsequently developed and expanded by the philosopher John Leslie (not to be confused with the TV presenter of the same name). It also re-appeared in slightly different guise through a paper in the serious scientific journal Nature by the eminent physicist Richard Gott. Evidently, for some reason, some serious people take it very seriously indeed.

The Doomsday argument uses the language of probability theory, but it is such a strange argument that I think the best way to explain it is to begin with a more straightforward problem of the same type.

 Imagine you are a visitor in an unfamiliar, but very populous, city. For the sake of argument let’s assume that it is in China. You know that this city is patrolled by traffic wardens, each of whom carries a number on their uniform.  These numbers run consecutively from 1 (smallest) to T (largest) but you don’t know what T is, i.e. how many wardens there are in total. You step out of your hotel and discover traffic warden number 347 sticking a ticket on your car. What is your best estimate of T, the total number of wardens in the city?

 I gave a short lunchtime talk about this when I was working at Queen Mary College, in the University of London. Every Friday, over beer and sandwiches, a member of staff or research student would give an informal presentation about their research, or something related to it. I decided to give a talk about bizarre applications of probability in cosmology, and this problem was intended to be my warm-up. I was amazed at the answers I got to this simple question. The majority of the audience denied that one could make any inference at all about T based on a single observation like this, other than that it  must be at least 347.

 Actually, a single observation like this can lead to a useful inference about T, using Bayes’ theorem. Suppose we have really no idea at all about T before making our observation; we can then adopt a uniform prior probability. Of course there must be an upper limit on T. There can’t be more traffic wardens than there are people, for example. Although China has a large population, the prior probability of there being, say, a billion traffic wardens in a single city must surely be zero. But let us take the prior to be effectively constant. Suppose the actual number of the warden we observe is t. Now we have to assume that we have an equal chance of coming across any one of the T traffic wardens outside our hotel. Each value of t (from 1 to T) is therefore equally likely. I think this is the reason that my astronomers’ lunch audience thought there was no information to be gleaned from an observation of any particular value, i.e. t=347.

 Let us simplify this argument further by allowing two alternative “models” for the frequency of Chinese traffic wardens. One has T=1000, and the other (just to be silly) has T=1,000,000. If I find number 347, which of these two alternatives do you think is more likely? Think about the kind of numbers that occupy the range from 1 to T. In the first case, most of the numbers have 3 digits. In the second, most of them have 6. If there were a million traffic wardens in the city, it is quite unlikely you would find a random individual with a number as small as 347. If there were only 1000, then 347 is just a typical number. There are strong grounds for favouring the first model over the second, simply based on the number actually observed. To put it another way, we would be surprised to encounter number 347 if T were actually a million. We would not be surprised if T were 1000.

 One can extend this argument to the entire range of possible values of T, and ask a more general question: if I observe traffic warden number t what is the probability I assign to each value of T? The answer is found using Bayes’ theorem. The prior, as I assumed above, is uniform. The likelihood is the probability of the observation given the model. If I assume a value of T, the probability P(t|T) of each value of t (up to and including T) is just 1/T (since each of the wardens is equally likely to be encountered). Bayes’ theorem can then be used to construct a posterior probability of P(T|t). Without going through all the nuts and bolts, I hope you can see that this probability will tail off for large T. Our observation of a (relatively) small value for t should lead us to suspect that T is itself (relatively) small. Indeed it’s a reasonable “best guess” that T=2t. This makes intuitive sense because the observed value of t then lies right in the middle of its range of possibilities.

 Before going on, it is worth mentioning one other point about this kind of inference: that it is not at all powerful. Note that the likelihood just varies as 1/T. That of course means that small values are favoured over large ones. But note that this probability is uniform in logarithmic terms. So although T=1000 is more probable than T=1,000,000,  the range between 1000 and 10,000 is roughly as likely as the range between 1,000,000 and 10,000,0000, assuming there is no prior information. So although it tells us something, it doesn’t actually tell us very much. Just like any probabilistic inference, there’s a chance that it is wrong, perhaps very wrong.

 What does all this have to do with Doomsday? Instead of traffic wardens, we want to estimate N, the number of humans that will ever be born, Following the same logic as in the example above, I assume that I am a “randomly” chosen individual drawn from the sequence of all humans to be born, in past present and future. For the sake of argument, assume I number n in this sequence. The logic I explained above should lead me to conclude that the total number N is not much larger than my number, n. For the sake of argument, assume that I am the one-billionth human to be born, i.e. n=1,000,000,0000.  There should not be many more than a few billion humans ever to be born. At the rate of current population growth, this means that not many more generations of humans remain to be born. Doomsday is nigh.

 Richard Gott’s version of this argument is logically similar, but is based on timescales rather than numbers. If whatever thing we are considering begins at some time tbegin and ends at a time tend and if we observe it at a “random” time between these two limits, then our best estimate for its future duration is of order how long it has lasted up until now. Gott gives the example of Stonehenge[1], which was built about 4,000 years ago: we should expect it to last a few thousand years into the future. Actually, Stonehenge is a highly dubious . It hasn’t really survived 4,000 years. It is a ruin, and nobody knows its original form or function. However, the argument goes that if we come across a building put up about twenty years ago, presumably we should think it will come down again (whether by accident or design) in about twenty years time. If I happen to walk past a building just as it is being finished, presumably I should hang around and watch its imminent collapse….

But I’m being facetious.

Following this chain of thought, we would argue that, since humanity has been around a few hundred thousand years, it is expected to last a few hundred thousand years more. Doomsday is not quite as imminent as previously, but in any case humankind is not expected to survive sufficiently long to, say, colonize the Galaxy.

 You may reject this type of argument on the grounds that you do not accept my logic in the case of the traffic wardens. If so, I think you are wrong. I would say that if you accept all the assumptions entering into the Doomsday argument then it is an equally valid example of inductive inference. The real issue is whether it is reasonable to apply this argument at all in this particular case. There are a number of related examples that should lead one to suspect that something fishy is going on. Usually the problem can be traced back to the glib assumption that something is “random” when or it is not clearly stated what that is supposed to mean.

 There are around sixty million British people on this planet, of whom I am one. In contrast there are 3 billion Chinese. If I follow the same kind of logic as in the examples I gave above, I should be very perplexed by the fact that I am not Chinese. After all, the odds are 50: 1 against me being British, aren’t they?

 Of course, I am not at all surprised by the observation of my non-Chineseness. My upbringing gives me access to a great deal of information about my own ancestry, as well as the geographical and political structure of the planet. This data convinces me that I am not a “random” member of the human race. My self-knowledge is conditioning information and it leads to such a strong prior knowledge about my status that the weak inference I described above is irrelevant. Even if there were a million million Chinese and only a hundred British, I have no grounds to be surprised at my own nationality given what else I know about how I got to be here.

 This kind of conditioning information can be applied to history, as well as geography. Each individual is generated by its parents. Its parents were generated by their parents, and so on. The genetic trail of these reproductive events connects us to our primitive ancestors in a continuous chain. A well-informed alien geneticist could look at my DNA and categorize me as an “early human”. I simply could not be born later in the story of humankind, even if it does turn out to continue for millennia. Everything about me – my genes, my physiognomy, my outlook, and even the fact that I bothering to spend time discussing this so-called paradox – is contingent on my specific place in human history. Future generations will know so much more about the universe and the risks to their survival that they won’t even discuss this simple argument. Perhaps we just happen to be living at the only epoch in human history in which we know enough about the Universe for the Doomsday argument to make some kind of sense, but too little to resolve it.

 To see this in a slightly different light, think again about Gott’s timescale argument. The other day I met an old friend from school days. It was a chance encounter, and I hadn’t seen the person for over 25 years. In that time he had married, and when I met him he was accompanied by a baby daughter called Mary. If we were to take Gott’s argument seriously, this was a random encounter with an entity (Mary) that had existed for less than a year. Should I infer that this entity should probably only endure another year or so? I think not. Again, bare numerological inference is rendered completely irrelevant by the conditioning information I have. I know something about babies. When I see one I realise that it is an individual at the start of its life, and I assume that it has a good chance of surviving into adulthood. Human civilization is a baby civilization. Like any youngster, it has dangers facing it. But is not doomed by the mere fact that it is young,

 John Leslie has developed many different variants of the basic Doomsday argument, and I don’t have the time to discuss them all here. There is one particularly bizarre version, however, that I think merits a final word or two because is raises an interesting red herring. It’s called the “Shooting Room”.

 Consider the following model for human existence. Souls are called into existence in groups representing each generation. The first generation has ten souls. The next has a hundred, the next after that a thousand, and so on. Each generation is led into a room, at the front of which is a pair of dice. The dice are rolled. If the score is double-six then everyone in the room is shot and it’s the end of humanity. If any other score is shown, everyone survives and is led out of the Shooting Room to be replaced by the next generation, which is ten times larger. The dice are rolled again, with the same rules. You find yourself called into existence and are led into the room along with the rest of your generation. What should you think is going to happen?

 Leslie’s argument is the following. Each generation not only has more members than the previous one, but also contains more souls than have ever existed to that point. For example, the third generation has 1000 souls; the previous two had 10 and 100 respectively, i.e. 110 altogether. Roughly 90% of all humanity lives in the last generation. Whenever the last generation happens, there bound to be more people in that generation than in all generations up to that point. When you are called into existence you should therefore expect to be in the last generation. You should consequently expect that the dice will show double six and the celestial firing squad will take aim. On the other hand, if you think the dice are fair then each throw is independent of the previous one and a throw of double-six should have a probability of just one in thirty-six. On this basis, you should expect to survive. The odds are against the fatal score.

 This apparent paradox seems to suggest that it matters a great deal whether the future is predetermined (your presence in the last generation requires the double-six to fall) or “random” (in which case there is the usual probability of a double-six). Leslie argues that if everything is pre-determined then we’re doomed. If there’s some indeterminism then we might survive. This isn’t really a paradox at all, simply an illustration of the fact that assuming different models gives rise to different probability assignments.

 While I am on the subject of the Shooting Room, it is worth drawing a parallel with another classic puzzle of probability theory, the St Petersburg Paradox. This is an old chestnut to do with a purported winning strategy for Roulette. It was first proposed by Nicolas Bernoulli but famously discussed at greatest length by Daniel Bernoulli in the pages of Transactions of the St Petersburg Academy, hence the name.  It works just as well for the case of a simple toss of a coin as for Roulette as in the latter game it involves betting only on red or black rather than on individual numbers.

 Imagine you decide to bet such that you win by throwing heads. Your original stake is £1. If you win, the bank pays you at even money (i.e. you get your stake back plus another £1). If you lose, i.e. get tails, your strategy is to play again but bet double. If you win this time you get £4 back but have bet £2+£1=£3 up to that point. If you lose again you bet £8. If you win this time, you get £16 back but have paid in £8+£4+£2+£1=£15 to that point. Clearly, if you carry on the strategy of doubling your previous stake each time you lose, when you do eventually win you will be ahead by £1. It’s a guaranteed winner. Isn’t it?

 The answer is yes, as long as you can guarantee that the number of losses you will suffer is finite. But in tosses of a fair coin there is no limit to the number of tails you can throw before getting a head. To get the correct probability of winning you have to allow for all possibilities. So what is your expected stake to win this £1? The answer is the root of the paradox. The probability that you win straight off is ½ (you need to throw a head), and your stake is £1 in this case so the contribution to the expectation is £0.50. The probability that you win on the second go is ¼ (you must lose the first time and win the second so it is ½ times ½) and your stake this time is £2 so this contributes the same £0.50 to the expectation. A moment’s thought tells you that each throw contributes the same amount, £0.50, to the expected stake. We have to add this up over all possibilities, and there are an infinite number of them. The result of summing them all up is therefore infinite. If you don’t believe this just think about how quickly your stake grows after only a few losses: £1, £2, £4, £8, £16, £32, £64, £128, £256, £512, £1024, etc. After only ten losses you are staking over a thousand pounds just to get your pound back. Sure, you can win £1 this way, but you need to expect to stake an infinite amount to guarantee doing so. It is not a very good way to get rich.

 The relationship of all this to the Shooting Room is that it is shows it is dangerous to pre-suppose a finite value for a number which in principle could be infinite. If the number of souls that could be called into existence is allowed to be infinite, then any individual as no chance at all of being called into existence in any generation!

 Amusing as they are, the thing that makes me most uncomfortable about these Doomsday arguments is that they attempt to determine a probability of an event without any reference to underlying mechanism. For me, a valid argument about Doomsday would have to involve a particular physical cause for the extinction of humanity (e.g. asteroid impact, climate change, nuclear war, etc). Given this physical mechanism one should construct a model within which one can estimate probabilities for the model parameters (such as the rate of occurrence of catastrophic asteroid impacts). Only then can one make a valid inference based on relevant observations and their associated likelihoods. Such calculations may indeed lead to alarming or depressing results. I fear that the greatest risk to our future survival is not from asteroid impact or global warming, where the chances can be estimated with reasonable precision, but self-destructive violence carried out by humans themselves. Science has no way of being able to predict what atrocities people are capable of so we can’t make any reliable estimate of the probability we will self-destruct. But the absence of any specific mechanism in the versions of the Doomsday argument I have discussed robs them of any scientific credibility at all.

There are better grounds for worrying about the future than mere numerology.

The First Digit Phenomenon

Posted in Bad Statistics, The Universe and Stuff with tags , , on March 11, 2009 by telescoper

I thought it would be fun to put up this quirky example of how sometimes things that really ought to be random turn out not to be. It’s also an excuse to mention a strange connection between astronomy and statistics.

The astronomer Simon Newcomb (right) was born in 1835 in Nova Scotia picture2(Canada). He had no real formal education at all, but since there wasn’t much else to do in Nova Scotia, he taught himself mathematics and astronomy and became very adept at performing astronomical calculations with great diligence. He began work in a lowly position at the US Nautical Almanac Office in 1857, and by 1877 he was director. He became was professor of Mathematics and Astronomy and Johns Hopkins University from 1884 until 1893 and was made the first ever president of the American Astronomical Society in 1899; he died in 1909.

Newcomb was performing lengthy numerical calculations in an era long before the invention of the pocket calculator or desktop computer. In those days many such calculations, including virtually anything involving multiplication, had to be done using logarithms. The logarithm (to the base ten) of a number x is defined to be the number a such that x=10a. To multiply two numbers whose logarithms are a and b respectively involves simply adding the logarithms: 10a times 10b=10(a+b), which helps a lot because adding is a lot easier than multiplying if you have no calculator. The initial logarithms are simply looked up in a table; to find the answer you use different tables to find the “inverse” logarithm.

Newcomb was a heavy user of his book of mathematical tables for this type of calculation, and it became very grubby and worn. But he also noticed that the first pages of the logarithms seemed to have been used much more than the others. This puzzled him greatly. Logarithm tables are presented in order of the first digit of the number required: the first pages therefore contain logarithms for numbers beginning with the digit 1. Newcomb used the tables for a vast range of different calculations of different things. He expected the first digits of numbers that he had to look up to just be as likely to be anything. Shouldn’t they be randomly distributed? Shouldn’t all the pages be equally used?

Once raised, this puzzle faded away until it was re-discovered in 1938 and acquired the name of Benford’s law, or the first digit phenomenon. In virtually any list you can think of – street addresses, city populations, lengths of rivers, and so on – there are more entries beginning with the digit “1” than any other digit.

To give another example, although I admit this one is much harder to explain, in the American Physical Society’s list of fundamental constants, or at least the last version I happened to look at, no less than 40% begin with the digit 1. If you’ve been writing physics examination papers recently like I have, you will notice a similar behaviour. Out of the 16 physical constants listed in the rubric of a physics examination paper lying on my desk right now, 6 begin with the digit 1.

So what is going on?

There is a (relatively) simple answer, and a more complicated one. I’ll take the simple one first.

Consider street numbers in an address book as an example. Suppose Any street will be numbered from 1 to N. It doesn’t really matter what N is as long as it is finite (and nobody has ever built an infinitely long street). Now think about the first digits of the addresses. There are 9 possibilities, because we never start an address with 0. On the face of it, we might expect a fraction 1/9 (approximately 11%) of the addresses will start with 1. Suppose N is 200. What fraction actually starts with 1? The answer is more than 50%. Everything from 100 upwards, plus 1, and 11 to 19. Very few start with 9: only 9 itself, and 90-99 inclusive. If N is 300 then there are still more beginning with 1 than any other digit, and there are no more that start with 9. One only gets close to an equal fraction of each starting number if the value of N is an exact power of 10, e.g. 1000.

Now you can see why pulling numbers out of an address book leads to a distribution of first digits that is not at all uniform. As long as the numbers are being drawn from a collection of streets each of whom has a finite upper limit, then the result is bound to be biased towards low starting digits. Only if every street contained an exact power of ten addresses would the result be uniform. Every other possibility favours 1 at the start.

The more complicated version involves a scaling argument and is a more suitable explanation for the appearance of this phenomenon in measured physical quantities. Lengths, heights and weights of things are usually measured with respect to some reference quantity. In the absence of any other information, one might imagine that the distribution of whatever is being measured possesses some sort of invariance or symmetry with respect to the scale being chosen. In this case the prior distribution p(x) can be taken to have the so-called Jeffreys form, which is uniform in the logarithm, i.e. p(x) is proportional to 1/x. There obviously must be a cut-off at some point as this can’t be allowed to go on forever as it doesn’t converge for large x, but this doesn’t really matter for the sake of this argument. We can suppose anyway that there are many powers of ten involved before this upper limit is reached.

In this case the probability that the first digit is D is just given by the ratio of two terms: In the numerator we have the integral between D and D+1 of p(x) (that’s a measure of how much of the distribution represents numbers starting with the digit D) and on the denominator we have the integral between 1 and 10 of p(x) (the overall measure). The result, if we take p(x) to be proportional to 1/x, is just log (1+1/D).

picture1

The shape of this distribution is shown in the Figure. Note that about 30% of the first digits are expected to be 1. Of course I have made a number of simplifying assumptions that are unlikely to be exactly true, and the case of the physical constants is complicated by the fact that some are measured and some are defined, but I think this captures the essential reason for the curious behaviour of first digits.

If nothing else, it provides a valuable lesson that you should be careful in what variables you assume are uniformly distributed!