The Antikythera Mechanism is a remarkable mechanical computer that’s thought to date from somewhere around 150 B.C. Our own Mike Edmunds is the lead academic on the Antikythera Mechanism Research Project which has been studying this amazing artefact so I thought he and other Cardiff folks would enjoy this, which shows a reproduction of the device made from Lego:
Archive for the The Universe and Stuff Category
Et in Arcadia Lego…
Posted in The Universe and Stuff with tags Antikythera Mechanism, Lego, Mike Edmunds on December 11, 2010 by telescoperDeductivism and Irrationalism
Posted in Bad Statistics, The Universe and Stuff with tags Bayesian probability, David Hume, epistemology, induction, Karl Popper, ontology, Paul Feyerabend, philosophy, philosophy of science, Rudolf Carnap, Science, Thomas Kuhn on December 11, 2010 by telescoperLooking at my stats I find that my recent introductory post about Bayesian probability has proved surprisingly popular with readers, so I thought I’d follow it up with a brief discussion of some of the philosophical issues surrounding it.
It is ironic that the pioneers of probability theory, principally Laplace, unquestionably adopted a Bayesian rather than frequentist interpretation for his probabilities. Frequentism arose during the nineteenth century and held sway until recently. I recall giving a conference talk about Bayesian reasoning only to be heckled by the audience with comments about “new-fangled, trendy Bayesian methods”. Nothing could have been less apt. Probability theory pre-dates the rise of sampling theory and all the frequentist-inspired techniques that modern-day statisticians like to employ.
Most disturbing of all is the influence that frequentist and other non-Bayesian views of probability have had upon the development of a philosophy of science, which I believe has a strong element of inverse reasoning or inductivism in it. The argument about whether there is a role for this type of thought in science goes back at least as far as Roger Bacon who lived in the 13th Century. Much later the brilliant Scottish empiricist philosopher and enlightenment figure David Hume argued strongly against induction. Most modern anti-inductivists can be traced back to this source. Pierre Duhem has argued that theory and experiment never meet face-to-face because in reality there are hosts of auxiliary assumptions involved in making this comparison. This is nowadays called the Quine-Duhem thesis.
Actually, for a Bayesian this doesn’t pose a logical difficulty at all. All one has to do is set up prior probability distributions for the required parameters, calculate their posterior probabilities and then integrate over those that aren’t related to measurements. This is just an expanded version of the idea of marginalization, explained here.
Rudolf Carnap, a logical positivist, attempted to construct a complete theory of inductive reasoning which bears some relationship to Bayesian thought, but he failed to apply Bayes’ theorem in the correct way. Carnap distinguished between two types or probabilities – logical and factual. Bayesians don’t – and I don’t – think this is necessary. The Bayesian definition seems to me to be quite coherent on its own.
Other philosophers of science reject the notion that inductive reasoning has any epistemological value at all. This anti-inductivist stance, often somewhat misleadingly called deductivist (irrationalist would be a better description) is evident in the thinking of three of the most influential philosophers of science of the last century: Karl Popper, Thomas Kuhn and, most recently, Paul Feyerabend. Regardless of the ferocity of their arguments with each other, these have in common that at the core of their systems of thought likes the rejection of all forms of inductive reasoning. The line of thought that ended in this intellectual cul-de-sac began, as I stated above, with the work of the Scottish empiricist philosopher David Hume. For a thorough analysis of the anti-inductivists mentioned above and their obvious debt to Hume, see David Stove’s book Popper and After: Four Modern Irrationalists. I will just make a few inflammatory remarks here.
Karl Popper really began the modern era of science philosophy with his Logik der Forschung, which was published in 1934. There isn’t really much about (Bayesian) probability theory in this book, which is strange for a work which claims to be about the logic of science. Popper also managed to, on the one hand, accept probability theory (in its frequentist form), but on the other, to reject induction. I find it therefore very hard to make sense of his work at all. It is also clear that, at least outside Britain, Popper is not really taken seriously by many people as a philosopher. Inside Britain it is very different and I’m not at all sure I understand why. Nevertheless, in my experience, most working physicists seem to subscribe to some version of Popper’s basic philosophy.
Among the things Popper has claimed is that all observations are “theory-laden” and that “sense-data, untheoretical items of observation, simply do not exist”. I don’t think it is possible to defend this view, unless one asserts that numbers do not exist. Data are numbers. They can be incorporated in the form of propositions about parameters in any theoretical framework we like. It is of course true that the possibility space is theory-laden. It is a space of theories, after all. Theory does suggest what kinds of experiment should be done and what data is likely to be useful. But data can be used to update probabilities of anything.
Popper has also insisted that science is deductive rather than inductive. Part of this claim is just a semantic confusion. It is necessary at some point to deduce what the measurable consequences of a theory might be before one does any experiments, but that doesn’t mean the whole process of science is deductive. He does, however, reject the basic application of inductive reasoning in updating probabilities in the light of measured data; he asserts that no theory ever becomes more probable when evidence is found in its favour. Every scientific theory begins infinitely improbable, and is doomed to remain so.
Now there is a grain of truth in this, or can be if the space of possibilities is infinite. Standard methods for assigning priors often spread the unit total probability over an infinite space, leading to a prior probability which is formally zero. This is the problem of improper priors. But this is not a killer blow to Bayesianism. Even if the prior is not strictly normalizable, the posterior probability can be. In any case, given sufficient relevant data the cycle of experiment-measurement-update of probability assignment usually soon leaves the prior far behind. Data usually count in the end.
The idea by which Popper is best known is the dogma of falsification. According to this doctrine, a hypothesis is only said to be scientific if it is capable of being proved false. In real science certain “falsehood” and certain “truth” are almost never achieved. Theories are simply more probable or less probable than the alternatives on the market. The idea that experimental scientists struggle through their entire life simply to prove theorists wrong is a very strange one, although I definitely know some experimentalists who chase theories like lions chase gazelles. To a Bayesian, the right criterion is not falsifiability but testability, the ability of the theory to be rendered more or less probable using further data. Nevertheless, scientific theories generally do have untestable components. Any theory has its interpretation, which is the untestable baggage that we need to supply to make it comprehensible to us. But whatever can be tested can be scientific.
Popper’s work on the philosophical ideas that ultimately led to falsificationism began in Vienna, but the approach subsequently gained enormous popularity in western Europe. The American Thomas Kuhn later took up the anti-inductivist baton in his book The Structure of Scientific Revolutions. Kuhn is undoubtedly a first-rate historian of science and this book contains many perceptive analyses of episodes in the development of physics. His view of scientific progress is cyclic. It begins with a mass of confused observations and controversial theories, moves into a quiescent phase when one theory has triumphed over the others, and lapses into chaos again when the further testing exposes anomalies in the favoured theory. Kuhn adopted the word paradigm to describe the model that rules during the middle stage,
The history of science is littered with examples of this process, which is why so many scientists find Kuhn’s account in good accord with their experience. But there is a problem when attempts are made to fuse this historical observation into a philosophy based on anti-inductivism. Kuhn claims that we “have to relinquish the notion that changes of paradigm carry scientists ..closer and closer to the truth.” Einstein’s theory of relativity provides a closer fit to a wider range of observations than Newtonian mechanics, but in Kuhn’s view this success counts for nothing.
Paul Feyerabend has extended this anti-inductivist streak to its logical (though irrational) extreme. His approach has been dubbed “epistemological anarchism”, and it is clear that he believed that all theories are equally wrong. He is on record as stating that normal science is a fairytale, and that equal time and resources should be spent on “astrology, acupuncture and witchcraft”. He also categorised science alongside “religion, prostitution, and so on”. His thesis is basically that science is just one of many possible internally consistent views of the world, and that the choice between which of these views to adopt can only be made on socio-political grounds.
Feyerabend’s views could only have flourished in a society deeply disillusioned with science. Of course, many bad things have been done in science’s name, and many social institutions are deeply flawed. One can’t expect anything operated by people to run perfectly. It’s also quite reasonable to argue on ethical grounds which bits of science should be funded and which should not. But the bottom line is that science does have a firm methodological basis which distinguishes it from pseudo-science, the occult and new age silliness. Science is distinguished from other belief-systems by its rigorous application of inductive reasoning and its willingness to subject itself to experimental test. Not all science is done properly, of course, and bad science is as bad as anything.
The Bayesian interpretation of probability leads to a philosophy of science which is essentially epistemological rather than ontological. Probabilities are not “out there” in external reality, but in our minds, representing our imperfect knowledge and understanding. Scientific theories are not absolute truths. Our knowledge of reality is never certain, but we are able to reason consistently about which of our theories provides the best available description of what is known at any given time. If that description fails when more data are gathered, we move on, introducing new elements or abandoning the theory for an alternative. This process could go on forever. There may never be a final theory. But although the game might have no end, at least we know the rules….
(Guest Post) The GREAT10 Challenge
Posted in The Universe and Stuff with tags Cosmology, Tom Kitching, weak gravitational lensing on December 8, 2010 by telescoperI haven’t had any guest posts for a while, so I was happy to respond to an offer from Tom Kitching to do one about the GREAT10 challenge. I’ve been working a bit on weak gravitational lensing myself recently – or rather my excellent and industrious postdoc Dipak Munshi has, and I’ve been struggling to keep up! Anyway, here’s Tom’s contribution…
–0–
This guest post is about the the GREAT10 challenge, which was launched this week, I’ll briefly explain why this is important for cosmology, what the GREAT10 challenge is, and how you can take part. For more information please visit the website, or read the GREAT10 Handbook.
GREAT10 is focussed on weak gravitational lensing. This is an effect that distorts the shape of every galaxy we see, introducing a very small additional ellipticity to galaxy images. Weak lensing is a interesting cosmological probe because it can be used to measure both the rate of growth of structure and the geometry of the Universe. This enables extremely precise determinations of dark energy, dark matter and modified gravity. We can either use it to make maps of the dark matter distribution or to generate statistics, such as correlation functions, that depend sensitively on cosmological parameters.
As shown in the Figure (click it for a higher-resolution version), the weak lensing effect varies as a function of position (left; taken from Massey et al. 2007), which can be used to map dark matter (centre) or the correlation function of the shear can be constructed (right; taken from Fu et al. 2008).
However, the additional ellipticity induced by weak lensing generates only about a 1% change in the surface brightness profile for any galaxy, far too small to been seen by eye, so we need to extract this “shear” signal using software and analyse its effect statistically over many millions of galaxies. To make things more complicated, images contain noise, and are blurred by a PSF (or convolution kernel) caused by atmospheric turbulence and telescope effects.
So the image of a galaxy is sheared by the large scale structure, then blurred by the PSF of the atmosphere and telescope, and finally distorted further by being represented by pixels in a camera. Star images are not sheared, but are blurred by the PSF. The challenge is to measure the shear effect (which is small) in the presence of all these other complications.
GREAT10 provides an environment in which algorithms and methods for measuring the shear, and dealing with the PSF, can be developed. GREAT10 is a public challenge, and we encourage everyone to take part, in particular we encourage new ideas from different areas of astronomy, computer science and industry. The challenge contains two aspects :
- The Star Challenge : Is to the reconstruct the Point Spread Function, or convolution kernel, in astronomical images, which occurs because of the slight blurring effects of the telescope and atmosphere. The PSF varies across each image and is only sparsely sampled by stars, which are pixelated and noisy. The challenge is to reconstruct the PSF at non-star positions.
- The Galaxy Challenge : Is to measure the shapes of galaxies to reconstruct the gravitational lensing signal in the presence of noise and a known Point Spread Function. The signal is a very small change in the galaxies’ ellipticity, an exactly circular galaxy image would be changed into an ellipse; however real galaxies are not circular. The challenge is to measure this effect over 52 million galaxies.
The challenges are run as a competition, and will run for 9 months. The prize for the winner is a trip to the final meeting at JPL, Pasadena, and an iPad or similar (sorry Peter! I know you don’t like Apple), but of course the real prize is the knowledge that you will have helped in creating the tools that will enable us to decipher the puzzle of understanding our Universe.
For more discussion on GREAT10 see MSNBC, WIRED and NASA.
–0–
EDITOR’S NOTE: I assume that second prize is two iPads…
The Sun’s not Behaving…
Posted in The Universe and Stuff with tags filament, NASA, Solar Dynamics Observatory, Sun on December 6, 2010 by telescoperCheck out this dramatic and slightly alarming picture of a huge filament emanating from the surface of the Sun, courtesy of NASA’s Solar Dynamics Observatory. The filament is about 700,000km long, apparently – that’s an entire Solar Radius. It’s expected to collapse back into the Sun at some point, an event which should be rather exciting! For more details see here.

Even better, here’s a close-up animation.

It reminds me a bit of that Balrog thing in The Lord of the Rings that gave Gandalf such a good run for his money.
A Main Sequence for Galaxies?
Posted in Bad Statistics, The Universe and Stuff with tags astronomy, Cosmology, galaxies, Hertzsprung-Russell diagram, Mike Disney, Principal Components Analysis on December 2, 2010 by telescoperNot for the first time in my life I find myself a bit of a laughing stock, after blowing my top during a seminar at Cardiff yesterday by retired Professor Mike Disney. In fact I got so angry that, much to the amusement of my colleagues, I stormed out. I don’t often lose my temper, and am not proud of having done so, but I reached a point when the red mist descended. What caused it was bad science and, in particular, bad statistics. It was all a big pity because what could have been an interesting discussion of an interesting result was ruined by too many unjustified assertions and too little attention to the underlying basis of the science. I still believe that no matter how interesting the results are, it’s the method that really matters.
The interesting result that Mike Disney talked about emerges from a Principal Components Analysis (PCA) of the data relating to a sample of about 200 galaxies; it was actually published in Nature a couple of years ago; the arXiv version is here. It was the misleading way this was discussed in the seminar that got me so agitated so I’ll give my take on it now that I’ve calmed down to explain what I think is going on.
In fact, Principal Component Analysis is a very simple technique and shouldn’t really be controversial at all. It is a way of simplifying the representation of multivariate data by looking for the correlations present within it. To illustrate how it works, consider the following two-dimensional (i.e. bivariate) example I took from a nice tutorial on the method.

In this example the measured variables are Pressure and Temperature. When you plot them against each other you find they are correlated, i.e. the pressure tends to increase with temperature (or vice-versa). When you do a PCA of this type of dataset you first construct the covariance matrix (or, more precisely, its normalized form the correlation matrix). Such matrices are always symmetric and square (i.e. N×N, where N is the number of measurements involved at each point; in this case N=2) . What the PCA does is to determine the eigenvalues and eigenvectors of the correlation matrix.
The eigenvectors for the example above are shown in the diagram – they are basically the major and minor axes of an ellipse drawn to fit the scatter plot; these two eigenvectors (and their associated eigenvalues) define the principal components as linear combinations of the original variables. Notice that along one principal direction (v1) there is much more variation than the other (v2). This means that most of the variance in the data set is along the direction indicated by the vector v1, and relatively little in the orthogonal direction v2; the eigenvalue for the first vector is consequently larger than that for the second.
The upshot of this is that the description of this (very simple) dataset can be compressed by using the first principal component rather than the original variables, i.e. by switching from the original two variables (pressure and temperature) to one variable (v1) we have compressed our description without losing much information (only the little bit that is involved in the scatter in the v2 direction.
In the more general case of N observables there will be N principal components, corresponding to vectors in an N-dimensional space, but nothing changes qualitatively. What the PCA does is to rank the eigenvectors according to their eigenvalue (i.e. the variance associated with the direction of the eigenvector). The first principal component is the one with the largest variance, and so on down the ordered list.
Where PCA is useful with large data sets is when the variance associated with the first (or first few) principal components is very much larger than the rest. In that case one can dispense with the N variables and just use one or two.
In the cases discussed by Professor Disney yesterday the data involved six measurable parameters of each galaxy: (1) a dynamical mass estimate; (2) the mass inferred from HI emission (21cm); (3) the total luminosity; (4) radius; (5) a measure of the central concentration of the galaxy; and (6) a measure of its colour. The PCA analysis of these data reveals that about 80% of the variance in the data set is associated with the first principal component, so there is clearly a significant correlation present in the data although, to be honest, I have seen many PCA analyses with much stronger concentrations of variance in the first eigenvector so it doesn’t strike me as being particularly strong.
However, thinking as a physicist rather than a statistician there is clearly something very interesting going on. From a theoretical point of view one would imagine that the properties of an individual galaxy might be controlled by as many as six independent parameters including mass, angular momentum, baryon fraction, age and size, as well as by the accidents of its recent haphazard merger history.
Disney et al. argue that for gaseous galaxies to appear as a one-parameter set, as observed here, the theory of galaxy formation and evolution must supply at least five independent constraint equations in order to collapse everything into a single parameter.
This is all vaguely reminiscent of the Hertzsprung-Russell diagram, or at least the main sequence thereof:

You can see here that there’s a correlation between temperature and luminosity which constrains this particular bivariate data set to lie along a (nearly) one-dimensional track in the diagram. In fact these properties correlate with each other because there is a single parameter model relating all properties of main sequence stars to their mass. In other words, once you fix the mass of a main sequence star, it has a fixed luminosity, temperature, and radius (apart from variations caused by age, metallicity, etc). Of course the problem is that masses of stars are difficult to determine so this parameter is largely hidden from the observer. What is really happening is that luminosity and temperature correlate with each other, because they both depend on the hidden parameter mass.
I don’t think that the PCA result disproves the current theory of hierarchical galaxy formation (which is what Disney claims) but it will definitely be a challenge for theorists to provide a satisfactory explanation of the result! My own guess for the physical parameter that accounts for most of the variation in this data set is the mass of the dark halo within which the galaxy is embedded. In other words, it might really be just like the Hertzsprung-Russell diagram…
But back to my argument with Mike Disney. I asked what is the first principal component of the galaxy data, i.e. what does the principal eigenvector look like? He refused to answer, saying that it was impossible to tell. Of course it isn’t, as the PCA method actually requires it to be determined. Further questioning seemed to reveal a basic misunderstanding of the whole idea of PCA which made the assertion that all of modern cosmology would need to be revised somewhat difficult to swallow. At that point of deadlock, I got very angry and stormed out.
I realise that behind the confusion was a reasonable point. The first principal component is well-defined, i.e. v1 is completely well defined in the first figure. However, along the line defined by that vector, P and T are proportional to each other so in a sense only one of them is needed to specify a position along this line. But you can’t say on the basis of this analysis alone that the fundamental variable is either pressure or temperature; they might be correlated through a third quantity you don’t know about.
Anyway, as a postscript I’ll say I did go and apologize to Mike Disney afterwards for losing my rag. He was very forgiving, although I probably now have a reputation for being a grumpy old bastard. Which I suppose I am. He also said one other thing, that he didn’t mind me getting angry because it showed I cared about the truth. Which I suppose I do.
Doubts about the Evidence for Penrose’s Cyclic Universe
Posted in Bad Statistics, Cosmic Anomalies, The Universe and Stuff with tags arXiv:1011.3706, Cosmic Microwave Background, Cosmology, cyclic universe, Roger Penrose, V.G. Gurzadyan, WMAP on November 28, 2010 by telescoperA strange paper by Gurzadyan and Penrose hit the Arxiv a week or so ago. It seems to have generated quite a lot of reaction in the blogosphere and has now made it onto the BBC News, so I think it merits a comment.
The authors claim to have found evidence that supports Roger Penrose‘s conformal cyclic cosmology in the form of a series of (concentric) rings of unexpectedly low variance in the pattern of fluctuations in the cosmic microwave background seen by the Wilkinson Microwave Anisotropy Probe (WMAP). There’s no doubt that a real discovery of such signals in the WMAP data would point towards something radically different from the standard Big Bang cosmology.
I haven’t tried to reproduce Gurzadyan & Penrose’s result in detail, as I haven’t had time to look at it, and I’m not going to rule it out without doing a careful analysis myself. However, what I will say here is that I think you should take the statistical part of their analysis with a huge pinch of salt.
Here’s why.
The authors report a hugely significant detection of their effect (they quote a “6-σ” result; in other words, the expected feature is expected to arise in the standard cosmological model with a probability of less than 10-7. The type of signal can be seen in their Figure 2, which I reproduce here:
Sorry they’re hard to read, but these show the variance measured on concentric rings (y-axis) of varying radius (x-axis) as seen in the WMAP W (94 Ghz) and V (54 Ghz) frequency channels (top two panels) compared with what is seen in a simulation with purely Gaussian fluctuations generated within the framework of the standard cosmological model (lower panel). The contrast looks superficially impressive, but there’s much less to it than meets the eye.
For a start, the separate WMAP W and V channels are not the same as the cosmic microwave background. There is a great deal of galactic foreground that has to be cleaned out of these maps before the pristine primordial radiation can be isolated. The fact similar patterns can be found in the BOOMERANG data by no means rules out a foreground contribution as a common explanation of anomalous variance. The authors have excluded the region at low galactic latitude (|b|<20°) in order to avoid the most heavily contaminated parts of the sky, but this is by no means guaranteed to eliminate foreground contributions entirely. Here is the all-sky WMAP W-band map for example:

Moreover, these maps also contain considerable systematic effects arising from the scanning strategy of the WMAP satellite. The most obvious of these is that the signal-to-noise varies across the sky, but there are others, such as the finite size of the beam of the WMAP telescope.
Neither galactic foregrounds nor correlated noise are present in the Gaussian simulation shown in the lower panel, and the authors do not say what kind of beam smoothing is used either. The comparison of WMAP single-channel data with simple Gaussian simulations is consequently deeply flawed and the significance level quoted for the result is certainly meaningless.
Having not looked looked at this in detail myself I’m not going to say that the authors’ conclusions are necessarily false, but I would be very surprised if an effect this large was real given the strenuous efforts so many people have made to probe the detailed statistics of the WMAP data; see, e.g., various items in my blog category on cosmic anomalies. Cosmologists have been wrong before, of course, but then so have even eminent physicists like Roger Penrose…
Another point that I’m not sure about at all is even if the rings of low variance are real – which I doubt – do they really provide evidence of a cyclic universe? It doesn’t seem obvious to me that the model Penrose advocates would actually produce a CMB sky that had such properties anyway.
Above all, I stress that this paper has not been subjected to proper peer review. If I were the referee I’d demand a much higher level of rigour in the analysis before I would allow it to be published in a scientific journal. Until the analysis is done satisfactorily, I suggest that serious students of cosmology shouldn’t get too excited by this result.
It occurs to me that other cosmologists out there might have looked at this result in more detail than I have had time to. If so, please feel free to add your comments in the box…
IMPORTANT UPDATE: 7th December. Two papers have now appeared on the arXiv (here and here) which refute the Gurzadyan-Penrose claim. Apparently, the data behave as Gurzadyan and Penrose claim, but so do proper simulations. In otherwords, it’s the bottom panel of the figure that’s wrong.
ANOTHER UPDATE: 8th December. Gurzadyan and Penrose have responded with a two-page paper which makes so little sense I had better not comment at all.
Ways of Thinking
Posted in Biographical, The Universe and Stuff with tags mathematics, Physics, Richard Feynman on November 25, 2010 by telescoperI’m putting one more Richard Feynman clip up. This one struck me as particularly interesting, because it touches on a question I’ve often asked myself: what goes on in your head when do you mathematical calculations? I think I agree with Feynman’s suggestion that different people think in very different ways about the same kind of calculation or other activity.
There’s no doubt in my mind that I’ve become slower and slower at doing mathematics as I’ve got older, and probably less accurate too. I think that’s partly just age – and perhaps the cumulative effect of too much wine! – but it’s partly because I have so many other things to think about these days that it’s hard to spend long hours without interruption thinking about the same problem the way I could when I was a student or a postdoc.
In any case, although much of my research is mathematical, I’ve never really thought of myself as being in any sense a mathematical person. Many of my colleagues have much better technical skills in that regard than I’ve ever had. I was never particularly good at maths at school either. I was sufficiently competent at maths to do physics, of course, but I was much better at other things at that age. My best subject at O-level was Latin, for example, which possibly indicates that my brain prefers to work verbally (or perhaps symbolically) rather than, as no doubt many others’ do, geometrically or in some other abstract way.
Another strange thing is the role of vision in doing mathematics. I can’t do maths at all without writing things down on paper. I have to be able to see the equations to think about solving them. Amongst other things this makes it difficult when you’re working things out on a blackboard (or whiteboard); you have to write symbols so large that your field of view can’t take in a whole equation. I often have to step back up one of the aisles to get a good look at what I’m doing like that. Other physicists – notably Stephen Hawking – obviously manage without writing things down at all. I find it impossible to imagine having that ability.
But I endorse what Richard Feynman says at the beginning of the clip. It’s really all about being interested in the questions, which gives you the motivation to acquire the skills needed to find the answers. I think of it as being like music. If you’re drawn into the world of music, even if you’re talented you have to practice long for long hours before you can really play an instrument. Few can reach the level of Feynman (or a concert pianist) of course – I’m certainly not among either of those categories! – but I think physics is at least as much perspiration as inspiration.
In contrast to many of my colleagues I’m utterly hopeless at chess – and other games that require very sophisticated pattern-reading skills – but good at crosswords and word-puzzles. Maybe I’m in the wrong job?
A Little Bit of Bayes
Posted in Bad Statistics, The Universe and Stuff with tags Bayesian probability, Big Bang, Cosmology, Frequentist, inductive reasoning, maximum likelihood, Richard Cox on November 21, 2010 by telescoperI thought I’d start a series of occasional posts about Bayesian probability. This is something I’ve touched on from time to time but its perhaps worth covering this relatively controversial topic in a slightly more systematic fashion especially with regard to how it works in cosmology.
I’ll start with Bayes’ theorem which for three logical propositions (such as statements about the values of parameters in theory) A, B and C can be written in the form
where
This is (or should be!) uncontroversial as it is simply a result of the sum and product rules for combining probabilities. Notice, however, that I’ve not restricted it to two propositions A and B as is often done, but carried throughout an extra one (C). This is to emphasize the fact that, to a Bayesian, all probabilities are conditional on something; usually, in the context of data analysis this is a background theory that furnishes the framework within which measurements are interpreted. If you say this makes everything model-dependent, then I’d agree. But every interpretation of data in terms of parameters of a model is dependent on the model. It has to be. If you think it can be otherwise then I think you’re misguided.
In the equation, P(B|C) is the probability of B being true, given that C is true . The information C need not be definitely known, but perhaps assumed for the sake of argument. The left-hand side of Bayes’ theorem denotes the probability of B given both A and C, and so on. The presence of C has not changed anything, but is just there as a reminder that it all depends on what is being assumed in the background. The equation states a theorem that can be proved to be mathematically correct so it is – or should be – uncontroversial.
Now comes the controversy. In the “frequentist” interpretation of probability, the entities A, B and C would be interpreted as “events” (e.g. the coin is heads) or “random variables” (e.g. the score on a dice, a number from 1 to 6) attached to which is their probability, indicating their propensity to occur in an imagined ensemble. These things are quite complicated mathematical objects: they don’t have specific numerical values, but are represented by a measure over the space of possibilities. They are sort of “blurred-out” in some way, the fuzziness representing the uncertainty in the precise value.
To a Bayesian, the entities A, B and C have a completely different character to what they represent for a frequentist. They are not “events” but logical propositions which can only be either true or false. The entities themselves are not blurred out, but we may have insufficient information to decide which of the two possibilities is correct. In this interpretation, P(A|C) represents the degree of belief that it is consistent to hold in the truth of A given the information C. Probability is therefore a generalization of the “normal” deductive logic expressed by Boolean algebra: the value “0” is associated with a proposition which is false and “1” denotes one that is true. Probability theory extends this logic to the intermediate case where there is insufficient information to be certain about the status of the proposition.
A common objection to Bayesian probability is that it is somehow arbitrary or ill-defined. “Subjective” is the word that is often bandied about. This is only fair to the extent that different individuals may have access to different information and therefore assign different probabilities. Given different information C and C′ the probabilities P(A|C) and P(A|C′) will be different. On the other hand, the same precise rules for assigning and manipulating probabilities apply as before. Identical results should therefore be obtained whether these are applied by any person, or even a robot, so that part isn’t subjective at all.
In fact I’d go further. I think one of the great strengths of the Bayesian interpretation is precisely that it does depend on what information is assumed. This means that such information has to be stated explicitly. The essential assumptions behind a result can be – and, regrettably, often are – hidden in frequentist analyses. Being a Bayesian forces you to put all your cards on the table.
To a Bayesian, probabilities are always conditional on other assumed truths. There is no such thing as an absolute probability, hence my alteration of the form of Bayes’s theorem to represent this. A probability such as P(A) has no meaning to a Bayesian: there is always conditioning information. For example, if I blithely assign a probability of 1/6 to each face of a dice, that assignment is actually conditional on me having no information to discriminate between the appearance of the faces, and no knowledge of the rolling trajectory that would allow me to make a prediction of its eventual resting position.
In tbe Bayesian framework, probability theory becomes not a branch of experimental science but a branch of logic. Like any branch of mathematics it cannot be tested by experiment but only by the requirement that it be internally self-consistent. This brings me to what I think is one of the most important results of twentieth century mathematics, but which is unfortunately almost unknown in the scientific community. In 1946, Richard Cox derived the unique generalization of Boolean algebra under the assumption that such a logic must involve associated a single number with any logical proposition. The result he got is beautiful and anyone with any interest in science should make a point of reading his elegant argument. It turns out that the only way to construct a consistent logic of uncertainty incorporating this principle is by using the standard laws of probability. There is no other way to reason consistently in the face of uncertainty than probability theory. Accordingly, probability theory always applies when there is insufficient knowledge for deductive certainty. Probability is inductive logic.
This is not just a nice mathematical property. This kind of probability lies at the foundations of a consistent methodological framework that not only encapsulates many common-sense notions about how science works, but also puts at least some aspects of scientific reasoning on a rigorous quantitative footing. This is an important weapon that should be used more often in the battle against the creeping irrationalism one finds in society at large.
I posted some time ago about an alternative way of deriving the laws of probability from consistency arguments.
To see how the Bayesian approach works, let us consider a simple example. Suppose we have a hypothesis H (some theoretical idea that we think might explain some experiment or observation). We also have access to some data D, and we also adopt some prior information I (which might be the results of other experiments or simply working assumptions). What we want to know is how strongly the data D supports the hypothesis H given my background assumptions I. To keep it easy, we assume that the choice is between whether H is true or H is false. In the latter case, “not-H” or H′ (for short) is true. If our experiment is at all useful we can construct P(D|HI), the probability that the experiment would produce the data set D if both our hypothesis and the conditional information are true.
The probability P(D|HI) is called the likelihood; to construct it we need to have some knowledge of the statistical errors produced by our measurement. Using Bayes’ theorem we can “invert” this likelihood to give P(H|DI), the probability that our hypothesis is true given the data and our assumptions. The result looks just like we had in the first two equations:
Now we can expand the “normalising constant” K because we know that either H or H′ must be true. Thus
The P(H|DI) on the left-hand side of the first expression is called the posterior probability; the right-hand side involves P(H|I), which is called the prior probability and the likelihood P(D|HI). The principal controversy surrounding Bayesian inductive reasoning involves the prior and how to define it, which is something I’ll comment on in a future post.
The Bayesian recipe for testing a hypothesis assigns a large posterior probability to a hypothesis for which the product of the prior probability and the likelihood is large. It can be generalized to the case where we want to pick the best of a set of competing hypothesis, say H1 …. Hn. Note that this need not be the set of all possible hypotheses, just those that we have thought about. We can only choose from what is available. The hypothesis may be relatively simple, such as that some particular parameter takes the value x, or they may be composite involving many parameters and/or assumptions. For instance, the Big Bang model of our universe is a very complicated hypothesis, or in fact a combination of hypotheses joined together, involving at least a dozen parameters which can’t be predicted a priori but which have to be estimated from observations.
The required result for multiple hypotheses is pretty straightforward: the sum of the two alternatives involved in K above simply becomes a sum over all possible hypotheses, so that
and
If the hypothesis concerns the value of a parameter – in cosmology this might be, e.g., the mean density of the Universe expressed by the density parameter Ω0 – then the allowed space of possibilities is continuous. The sum in the denominator should then be replaced by an integral, but conceptually nothing changes. Our “best” hypothesis is the one that has the greatest posterior probability.
From a frequentist stance the procedure is often instead to just maximize the likelihood. According to this approach the best theory is the one that makes the data most probable. This can be the same as the most probable theory, but only if the prior probability is constant, but the probability of a model given the data is generally not the same as the probability of the data given the model. I’m amazed how many practising scientists make this error on a regular basis.
The following figure might serve to illustrate the difference between the frequentist and Bayesian approaches. In the former case, everything is done in “data space” using likelihoods, and in the other we work throughout with probabilities of hypotheses, i.e. we think in hypothesis space. I find it interesting to note that most theorists that I know who work in cosmology are Bayesians and most observers are frequentists!

As I mentioned above, it is the presence of the prior probability in the general formula that is the most controversial aspect of the Bayesian approach. The attitude of frequentists is often that this prior information is completely arbitrary or at least “model-dependent”. Being empirically-minded people, by and large, they prefer to think that measurements can be made and interpreted without reference to theory at all.
Assuming we can assign the prior probabilities in an appropriate way what emerges from the Bayesian framework is a consistent methodology for scientific progress. The scheme starts with the hardest part – theory creation. This requires human intervention, since we have no automatic procedure for dreaming up hypothesis from thin air. Once we have a set of hypotheses, we need data against which theories can be compared using their relative probabilities. The experimental testing of a theory can happen in many stages: the posterior probability obtained after one experiment can be fed in, as prior, into the next. The order of experiments does not matter. This all happens in an endless loop, as models are tested and refined by confrontation with experimental discoveries, and are forced to compete with new theoretical ideas. Often one particular theory emerges as most probable for a while, such as in particle physics where a “standard model” has been in existence for many years. But this does not make it absolutely right; it is just the best bet amongst the alternatives. Likewise, the Big Bang model does not represent the absolute truth, but is just the best available model in the face of the manifold relevant observations we now have concerning the Universe’s origin and evolution. The crucial point about this methodology is that it is inherently inductive: all the reasoning is carried out in “hypothesis space” rather than “observation space”. The primary form of logic involved is not deduction but induction. Science is all about inverse reasoning.
For comments on induction versus deduction in another context, see here.
So what are the main differences between the Bayesian and frequentist views?
First, I think it is fair to say that the Bayesian framework is enormously more general than is allowed by the frequentist notion that probabilities must be regarded as relative frequencies in some ensemble, whether that is real or imaginary. In the latter interpretation, a proposition is at once true in some elements of the ensemble and false in others. It seems to me to be a source of great confusion to substitute a logical AND for what is really a logical OR. The Bayesian stance is also free from problems associated with the failure to incorporate in the analysis any information that can’t be expressed as a frequency. Would you really trust a doctor who said that 75% of the people she saw with your symptoms required an operation, but who did not bother to look at your own medical files?
As I mentioned above, frequentists tend to talk about “random variables”. This takes us into another semantic minefield. What does “random” mean? To a Bayesian there are no random variables, only variables whose values we do not know. A random process is simply one about which we only have sufficient information to specify probability distributions rather than definite values.
More fundamentally, it is clear from the fact that the combination rules for probabilities were derived by Cox uniquely from the requirement of logical consistency, that any departure from these rules will generally speaking involve logical inconsistency. Many of the standard statistical data analysis techniques – including the simple “unbiased estimator” mentioned briefly above – used when the data consist of repeated samples of a variable having a definite but unknown value, are not equivalent to Bayesian reasoning. These methods can, of course, give good answers, but they can all be made to look completely silly by suitable choice of dataset.
By contrast, I am not aware of any example of a paradox or contradiction that has ever been found using the correct application of Bayesian methods, although method can be applied incorrectly. Furthermore, in order to deal with unique events like the weather, frequentists are forced to introduce the notion of an ensemble, a perhaps infinite collection of imaginary possibilities, to allow them to retain the notion that probability is a proportion. Provided the calculations are done correctly, the results of these calculations should agree with the Bayesian answers. On the other hand, frequentists often talk about the ensemble as if it were real, and I think that is very dangerous…
The Inconceivable Nature of Nature
Posted in The Universe and Stuff with tags eletromagnetic waves, Physics, Richard Feynman on November 19, 2010 by telescoperI had a couple of requests to post yet another Feynman clip. This one – about electromagnetic waves and swimming pools – is one that I vividly remember watching on BBC when it was first broadcast donkeys’ years ago. I think it’s totally wonderful.
At It
Posted in Poetry, The Universe and Stuff with tags A.S Eddington, Poetry, R. S. Thomas on November 18, 2010 by telescoperApologies for my posts being a bit thin on original content recently. There’s a lot going on at the moment and it has not been easy to find the time to write at any length. Before too long I hope to be able to get back into the swing of things and maybe even blog about science. Or even do some! In the meantime, however, I couldn’t resist passing on this poem called, At It, by R.S. Thomas. I’ve posted some of his verse on previous occasions, but I only found this one a few days ago and couldn’t resist sharing it, not least because it mentions Sir Arthur Eddington (probably in a reference to one of his popular science books).
I think he sits at that strange table
of Eddington’s. That is not a table
at all, but nodes and molecules
pushing against molecules
and nodes; and he writes there
in invisible handwriting the instructions
the genes follow. I imagine his
face that is more the face
of a clock, and the time told by it
is now, though Greece is referred
to and Egypt and empires
not yet begun.
And I would have
things to say to this God
at the judgement, storming at him,
as Job stormed with the eloquence
of the abused heart. But there will
be no judgement other than the verdict
of his calculations, that abstruse
geometry that proceeds eternally
in the silence beyond right and wrong.



