Cosmology | In the Dark

Archive for Cosmology

My Fellow Pagans …

Posted in The Universe and Stuff with tags August 11 1999, Cosmology, Eclipse on January 5, 2011 by telescoper

I was reminded yesterday of the following clipping, which I found in The Times, in 1999, just before the total eclipse that was visible from parts of the United Kingdom in that year. It was a feature about the concerns raised by certain residents of Cornwall about the possible effects of the sudden influx of visitors on the local community. Here is a scan of a big chunk of the story, which you probably can’t read…

.and here is a blow-up of the section shown in the red box, which places cosmologists in rather strange company:

This makes it clear what journalists on this rag think about cosmology! In protest, I wrote a letter to the The Times saying that, as a cosmologist, I thought this piece was very insulting … to Druids.

They didn’t publish it.

1 Comment »

Insignificance

Posted in The Universe and Stuff with tags anthropic principle, astronomy, Cosmology, life, Moon, partial eclipse, Sun, total eclipse on January 4, 2011 by telescoper

I’m told that there was a partial eclipse of the Sun visible from the UK this morning, although it was so cloudy here in Cardiff that I wouldn’t have seen anything even if I had bothered to get up in time to observe it. For more details of the event and pictures from people who managed to see it, see here. There’s also a nice article on the BBC website. The BBC are coordinating three days of programmes alongside a host of other events called Stargazing Live presumably timed to coincide with this morning’s eclipse. It’s taking a chance to do live broadcasts about astronomy given the British weather, but I hope they are successful in generating interest especially among the young.

As a spectacle a partial solar eclipse is pretty exciting – as long as it’s not cloudy – but even a full view of one can’t really be compared with the awesome event that is a total eclipse. I’m lucky enough to have observed one and I can tell you it was truly awe-inspiring.

If you think about it, though, it’s a very strange thing that such a thing is possible at all. In a total eclipse, the Moon passes between the Earth and the Sun in such a way that it exactly covers the Solar disk. In order for this to happen the apparent angular size of the Moon (as seen from Earth) has to be almost exactly the same as that of the Sun (as seen from Earth). This involves a strange coincidence: the Moon is small (about 1740 km in radius) but very close to the Earth in astronomical terms (about 400,000 km away). The Sun, on the other hand, is both enormously large (radius 700,000 km) and enormously distant (approx. 150,000,000 km). The ratio of radius to distance from Earth of these objects is almost identical at the point of a a total eclipse, so the apparent disk of the Moon almost exactly fits over that of the Sun. Why is this so?

The simple answer is that it is just a coincidence. There seems no particular physical reason why the geometry of the Earth-Moon-Sun system should have turned out this way. Moreover, the system is not static. The tides raised by the Moon on the Earth lead to frictional heating and a loss of orbital energy. The Moon’s orbit is therefore moving slowly outwards from the Earth. I’m not going to tell you exactly how quickly this happens, as it is one of the questions I set my students in the module Astrophysical Concepts I’ll be starting in a few weeks, but eventually the Earth-Moon distance will be too large for total eclipses of the Sun by the Moon to be possible on Earth, although partial and annular eclipses may still be possible.

It seems therefore that we just happen to be living at the right place at the right time to see total eclipses. Perhaps there are other inhabited moonless planets whose inhabitants will never see one. Future inhabitants of Earth will have to content themselves with watching eclipse clips on Youtube.

Things may be more complicated than this though. I’ve heard it argued that the existence of a moon reasonably close to the Earth may have helped the evolution of terrestrial life. The argument – as far as I understand it – is that life presumably began in the oceans, then amphibious forms evolved in tidal margins of some sort wherein conditions favoured both aquatic and land-dwelling creatures. Only then did life fully emerge from the seas and begin to live on land. If it is the case that the existence of significant tides is necessary for life to complete the transition from oceans to solid ground, then maybe the Moon played a key role in the evolution of dinosaurs, mammals, and even ourselves.

I’m not sure I’m convinced of this argument because, although the Moon is the dominant source of the Earth’s tides, it is not overwhelmingly so. The effect of the Sun is also considerable, only a factor of three smaller than the Moon. So maybe the Sun could have done the job on its own. I don’t know.

That’s not really the point of this post, however. What I wanted to comment on is that astronomers basically don’t question the interpretation of the occurence of total eclipses as simply a coincidence. Eclipses just are. There are no doubt many other planets where they aren’t. We’re special in that we live somewhere where something apparently unlikely happens. But this isn’t important because eclipses aren’t really all that significant in cosmic terms, other than that the law of physics allow them.

On the other hand astronomers (and many other people) do make a big deal of the fact that life exists in the Universe. Given what we know about fundamental physics and biology – which admittedly isn’t very much – this also seems unlikely. Perhaps there are many other worlds without life, so the Earth is special once again. Others argue that the existence of life is so unlikely that special provision must have been made to make it possible.

Before I find myself falling into the black hole marked “Anthropic Principle” let me just say that I don’t see the existence of life (including human life) as being of any greater significance than that of a total eclipse. Both phenomena are (subjectively) interesting to humans, both are contingent on particular circumstances, and both will no doubt cease to occur at some point in perhaps not-too-distant the future. Neither tells us much about the true nature of the Universe.

Let’s face it. We’re just not significant.

53 Comments »

(Guest Post) The GREAT10 Challenge

Posted in The Universe and Stuff with tags Cosmology, Tom Kitching, weak gravitational lensing on December 8, 2010 by telescoper

I haven’t had any guest posts for a while, so I was happy to respond to an offer from Tom Kitching to do one about the GREAT10 challenge. I’ve been working a bit on weak gravitational lensing myself recently – or rather my excellent and industrious postdoc Dipak Munshi has, and I’ve been struggling to keep up! Anyway, here’s Tom’s contribution…

–0–

This guest post is about the the GREAT10 challenge, which was launched this week, I’ll briefly explain why this is important for cosmology, what the GREAT10 challenge is, and how you can take part. For more information please visit the website, or read the GREAT10 Handbook.

GREAT10 is focussed on weak gravitational lensing. This is an effect that distorts the shape of every galaxy we see, introducing a very small additional ellipticity to galaxy images. Weak lensing is a interesting cosmological probe because it can be used to measure both the rate of growth of structure and the geometry of the Universe. This enables extremely precise determinations of dark energy, dark matter and modified gravity. We can either use it to make maps of the dark matter distribution or to generate statistics, such as correlation functions, that depend sensitively on cosmological parameters.

As shown in the Figure (click it for a higher-resolution version), the weak lensing effect varies as a function of position (left; taken from Massey et al. 2007), which can be used to map dark matter (centre) or the correlation function of the shear can be constructed (right; taken from Fu et al. 2008).

However, the additional ellipticity induced by weak lensing generates only about a 1% change in the surface brightness profile for any galaxy, far too small to been seen by eye, so we need to extract this “shear” signal using software and analyse its effect statistically over many millions of galaxies. To make things more complicated, images contain noise, and are blurred by a PSF (or convolution kernel) caused by atmospheric turbulence and telescope effects.

So the image of a galaxy is sheared by the large scale structure, then blurred by the PSF of the atmosphere and telescope, and finally distorted further by being represented by pixels in a camera. Star images are not sheared, but are blurred by the PSF. The challenge is to measure the shear effect (which is small) in the presence of all these other complications.

GREAT10 provides an environment in which algorithms and methods for measuring the shear, and dealing with the PSF, can be developed. GREAT10 is a public challenge, and we encourage everyone to take part, in particular we encourage new ideas from different areas of astronomy, computer science and industry. The challenge contains two aspects :

The Star Challenge : Is to the reconstruct the Point Spread Function, or convolution kernel, in astronomical images, which occurs because of the slight blurring effects of the telescope and atmosphere. The PSF varies across each image and is only sparsely sampled by stars, which are pixelated and noisy. The challenge is to reconstruct the PSF at non-star positions.
The Galaxy Challenge : Is to measure the shapes of galaxies to reconstruct the gravitational lensing signal in the presence of noise and a known Point Spread Function. The signal is a very small change in the galaxies’ ellipticity, an exactly circular galaxy image would be changed into an ellipse; however real galaxies are not circular. The challenge is to measure this effect over 52 million galaxies.

The challenges are run as a competition, and will run for 9 months. The prize for the winner is a trip to the final meeting at JPL, Pasadena, and an iPad or similar (sorry Peter! I know you don’t like Apple), but of course the real prize is the knowledge that you will have helped in creating the tools that will enable us to decipher the puzzle of understanding our Universe.

For more discussion on GREAT10 see MSNBC, WIRED and NASA.

–0–

EDITOR’S NOTE: I assume that second prize is two iPads…

7 Comments »

A Main Sequence for Galaxies?

Posted in Bad Statistics, The Universe and Stuff with tags astronomy, Cosmology, galaxies, Hertzsprung-Russell diagram, Mike Disney, Principal Components Analysis on December 2, 2010 by telescoper

Not for the first time in my life I find myself a bit of a laughing stock, after blowing my top during a seminar at Cardiff yesterday by retired Professor Mike Disney. In fact I got so angry that, much to the amusement of my colleagues, I stormed out. I don’t often lose my temper, and am not proud of having done so, but I reached a point when the red mist descended. What caused it was bad science and, in particular, bad statistics. It was all a big pity because what could have been an interesting discussion of an interesting result was ruined by too many unjustified assertions and too little attention to the underlying basis of the science. I still believe that no matter how interesting the results are, it’s the method that really matters.

The interesting result that Mike Disney talked about emerges from a Principal Components Analysis (PCA) of the data relating to a sample of about 200 galaxies; it was actually published in Nature a couple of years ago; the arXiv version is here. It was the misleading way this was discussed in the seminar that got me so agitated so I’ll give my take on it now that I’ve calmed down to explain what I think is going on.

In fact, Principal Component Analysis is a very simple technique and shouldn’t really be controversial at all. It is a way of simplifying the representation of multivariate data by looking for the correlations present within it. To illustrate how it works, consider the following two-dimensional (i.e. bivariate) example I took from a nice tutorial on the method.

In this example the measured variables are Pressure and Temperature. When you plot them against each other you find they are correlated, i.e. the pressure tends to increase with temperature (or vice-versa). When you do a PCA of this type of dataset you first construct the covariance matrix (or, more precisely, its normalized form the correlation matrix). Such matrices are always symmetric and square (i.e. N×N, where N is the number of measurements involved at each point; in this case N=2) . What the PCA does is to determine the eigenvalues and eigenvectors of the correlation matrix.

The eigenvectors for the example above are shown in the diagram – they are basically the major and minor axes of an ellipse drawn to fit the scatter plot; these two eigenvectors (and their associated eigenvalues) define the principal components as linear combinations of the original variables. Notice that along one principal direction (v₁) there is much more variation than the other (v₂). This means that most of the variance in the data set is along the direction indicated by the vector v₁, and relatively little in the orthogonal direction v₂; the eigenvalue for the first vector is consequently larger than that for the second.

The upshot of this is that the description of this (very simple) dataset can be compressed by using the first principal component rather than the original variables, i.e. by switching from the original two variables (pressure and temperature) to one variable (v₁) we have compressed our description without losing much information (only the little bit that is involved in the scatter in the v₂ direction.

In the more general case of N observables there will be N principal components, corresponding to vectors in an N-dimensional space, but nothing changes qualitatively. What the PCA does is to rank the eigenvectors according to their eigenvalue (i.e. the variance associated with the direction of the eigenvector). The first principal component is the one with the largest variance, and so on down the ordered list.

Where PCA is useful with large data sets is when the variance associated with the first (or first few) principal components is very much larger than the rest. In that case one can dispense with the N variables and just use one or two.

In the cases discussed by Professor Disney yesterday the data involved six measurable parameters of each galaxy: (1) a dynamical mass estimate; (2) the mass inferred from HI emission (21cm); (3) the total luminosity; (4) radius; (5) a measure of the central concentration of the galaxy; and (6) a measure of its colour. The PCA analysis of these data reveals that about 80% of the variance in the data set is associated with the first principal component, so there is clearly a significant correlation present in the data although, to be honest, I have seen many PCA analyses with much stronger concentrations of variance in the first eigenvector so it doesn’t strike me as being particularly strong.

However, thinking as a physicist rather than a statistician there is clearly something very interesting going on. From a theoretical point of view one would imagine that the properties of an individual galaxy might be controlled by as many as six independent parameters including mass, angular momentum, baryon fraction, age and size, as well as by the accidents of its recent haphazard merger history.

Disney et al. argue that for gaseous galaxies to appear as a one-parameter set, as observed here, the theory of galaxy formation and evolution must supply at least five independent constraint equations in order to collapse everything into a single parameter.

This is all vaguely reminiscent of the Hertzsprung-Russell diagram, or at least the main sequence thereof:

You can see here that there’s a correlation between temperature and luminosity which constrains this particular bivariate data set to lie along a (nearly) one-dimensional track in the diagram. In fact these properties correlate with each other because there is a single parameter model relating all properties of main sequence stars to their mass. In other words, once you fix the mass of a main sequence star, it has a fixed luminosity, temperature, and radius (apart from variations caused by age, metallicity, etc). Of course the problem is that masses of stars are difficult to determine so this parameter is largely hidden from the observer. What is really happening is that luminosity and temperature correlate with each other, because they both depend on the hidden parameter mass.

I don’t think that the PCA result disproves the current theory of hierarchical galaxy formation (which is what Disney claims) but it will definitely be a challenge for theorists to provide a satisfactory explanation of the result! My own guess for the physical parameter that accounts for most of the variation in this data set is the mass of the dark halo within which the galaxy is embedded. In other words, it might really be just like the Hertzsprung-Russell diagram…

But back to my argument with Mike Disney. I asked what is the first principal component of the galaxy data, i.e. what does the principal eigenvector look like? He refused to answer, saying that it was impossible to tell. Of course it isn’t, as the PCA method actually requires it to be determined. Further questioning seemed to reveal a basic misunderstanding of the whole idea of PCA which made the assertion that all of modern cosmology would need to be revised somewhat difficult to swallow. At that point of deadlock, I got very angry and stormed out.

I realise that behind the confusion was a reasonable point. The first principal component is well-defined, i.e. v₁ is completely well defined in the first figure. However, along the line defined by that vector, P and T are proportional to each other so in a sense only one of them is needed to specify a position along this line. But you can’t say on the basis of this analysis alone that the fundamental variable is either pressure or temperature; they might be correlated through a third quantity you don’t know about.

Anyway, as a postscript I’ll say I did go and apologize to Mike Disney afterwards for losing my rag. He was very forgiving, although I probably now have a reputation for being a grumpy old bastard. Which I suppose I am. He also said one other thing, that he didn’t mind me getting angry because it showed I cared about the truth. Which I suppose I do.

11 Comments »

Doubts about the Evidence for Penrose’s Cyclic Universe

Posted in Bad Statistics, Cosmic Anomalies, The Universe and Stuff with tags arXiv:1011.3706, Cosmic Microwave Background, Cosmology, cyclic universe, Roger Penrose, V.G. Gurzadyan, WMAP on November 28, 2010 by telescoper

A strange paper by Gurzadyan and Penrose hit the Arxiv a week or so ago. It seems to have generated quite a lot of reaction in the blogosphere and has now made it onto the BBC News, so I think it merits a comment.

The authors claim to have found evidence that supports Roger Penrose‘s conformal cyclic cosmology in the form of a series of (concentric) rings of unexpectedly low variance in the pattern of fluctuations in the cosmic microwave background seen by the Wilkinson Microwave Anisotropy Probe (WMAP). There’s no doubt that a real discovery of such signals in the WMAP data would point towards something radically different from the standard Big Bang cosmology.

I haven’t tried to reproduce Gurzadyan & Penrose’s result in detail, as I haven’t had time to look at it, and I’m not going to rule it out without doing a careful analysis myself. However, what I will say here is that I think you should take the statistical part of their analysis with a huge pinch of salt.

Here’s why.

The authors report a hugely significant detection of their effect (they quote a “6-σ” result; in other words, the expected feature is expected to arise in the standard cosmological model with a probability of less than 10^-7. The type of signal can be seen in their Figure 2, which I reproduce here:

Sorry they’re hard to read, but these show the variance measured on concentric rings (y-axis) of varying radius (x-axis) as seen in the WMAP W (94 Ghz) and V (54 Ghz) frequency channels (top two panels) compared with what is seen in a simulation with purely Gaussian fluctuations generated within the framework of the standard cosmological model (lower panel). The contrast looks superficially impressive, but there’s much less to it than meets the eye.

For a start, the separate WMAP W and V channels are not the same as the cosmic microwave background. There is a great deal of galactic foreground that has to be cleaned out of these maps before the pristine primordial radiation can be isolated. The fact similar patterns can be found in the BOOMERANG data by no means rules out a foreground contribution as a common explanation of anomalous variance. The authors have excluded the region at low galactic latitude (|b|<20°) in order to avoid the most heavily contaminated parts of the sky, but this is by no means guaranteed to eliminate foreground contributions entirely. Here is the all-sky WMAP W-band map for example:

Moreover, these maps also contain considerable systematic effects arising from the scanning strategy of the WMAP satellite. The most obvious of these is that the signal-to-noise varies across the sky, but there are others, such as the finite size of the beam of the WMAP telescope.

Neither galactic foregrounds nor correlated noise are present in the Gaussian simulation shown in the lower panel, and the authors do not say what kind of beam smoothing is used either. The comparison of WMAP single-channel data with simple Gaussian simulations is consequently deeply flawed and the significance level quoted for the result is certainly meaningless.

Having not looked looked at this in detail myself I’m not going to say that the authors’ conclusions are necessarily false, but I would be very surprised if an effect this large was real given the strenuous efforts so many people have made to probe the detailed statistics of the WMAP data; see, e.g., various items in my blog category on cosmic anomalies. Cosmologists have been wrong before, of course, but then so have even eminent physicists like Roger Penrose…

Another point that I’m not sure about at all is even if the rings of low variance are real – which I doubt – do they really provide evidence of a cyclic universe? It doesn’t seem obvious to me that the model Penrose advocates would actually produce a CMB sky that had such properties anyway.

Above all, I stress that this paper has not been subjected to proper peer review. If I were the referee I’d demand a much higher level of rigour in the analysis before I would allow it to be published in a scientific journal. Until the analysis is done satisfactorily, I suggest that serious students of cosmology shouldn’t get too excited by this result.

It occurs to me that other cosmologists out there might have looked at this result in more detail than I have had time to. If so, please feel free to add your comments in the box…

IMPORTANT UPDATE: 7th December. Two papers have now appeared on the arXiv (here and here) which refute the Gurzadyan-Penrose claim. Apparently, the data behave as Gurzadyan and Penrose claim, but so do proper simulations. In otherwords, it’s the bottom panel of the figure that’s wrong.

ANOTHER UPDATE: 8th December. Gurzadyan and Penrose have responded with a two-page paper which makes so little sense I had better not comment at all.

61 Comments »

A Little Bit of Bayes

Posted in Bad Statistics, The Universe and Stuff with tags Bayesian probability, Big Bang, Cosmology, Frequentist, inductive reasoning, maximum likelihood, Richard Cox on November 21, 2010 by telescoper

I thought I’d start a series of occasional posts about Bayesian probability. This is something I’ve touched on from time to time but its perhaps worth covering this relatively controversial topic in a slightly more systematic fashion especially with regard to how it works in cosmology.

I’ll start with Bayes’ theorem which for three logical propositions (such as statements about the values of parameters in theory) A, B and C can be written in the form

$P(B|AC) = K^{-1}P(B|C)P(A|BC) = K^{-1} P(AB|C)$

where

$K=P(A|C).$

This is (or should be!) uncontroversial as it is simply a result of the sum and product rules for combining probabilities. Notice, however, that I’ve not restricted it to two propositions A and B as is often done, but carried throughout an extra one (C). This is to emphasize the fact that, to a Bayesian, all probabilities are conditional on something; usually, in the context of data analysis this is a background theory that furnishes the framework within which measurements are interpreted. If you say this makes everything model-dependent, then I’d agree. But every interpretation of data in terms of parameters of a model is dependent on the model. It has to be. If you think it can be otherwise then I think you’re misguided.

In the equation, P(B|C) is the probability of B being true, given that C is true . The information C need not be definitely known, but perhaps assumed for the sake of argument. The left-hand side of Bayes’ theorem denotes the probability of B given both A and C, and so on. The presence of C has not changed anything, but is just there as a reminder that it all depends on what is being assumed in the background. The equation states a theorem that can be proved to be mathematically correct so it is – or should be – uncontroversial.

Now comes the controversy. In the “frequentist” interpretation of probability, the entities A, B and C would be interpreted as “events” (e.g. the coin is heads) or “random variables” (e.g. the score on a dice, a number from 1 to 6) attached to which is their probability, indicating their propensity to occur in an imagined ensemble. These things are quite complicated mathematical objects: they don’t have specific numerical values, but are represented by a measure over the space of possibilities. They are sort of “blurred-out” in some way, the fuzziness representing the uncertainty in the precise value.

To a Bayesian, the entities A, B and C have a completely different character to what they represent for a frequentist. They are not “events” but logical propositions which can only be either true or false. The entities themselves are not blurred out, but we may have insufficient information to decide which of the two possibilities is correct. In this interpretation, P(A|C) represents the degree of belief that it is consistent to hold in the truth of A given the information C. Probability is therefore a generalization of the “normal” deductive logic expressed by Boolean algebra: the value “0” is associated with a proposition which is false and “1” denotes one that is true. Probability theory extends this logic to the intermediate case where there is insufficient information to be certain about the status of the proposition.

A common objection to Bayesian probability is that it is somehow arbitrary or ill-defined. “Subjective” is the word that is often bandied about. This is only fair to the extent that different individuals may have access to different information and therefore assign different probabilities. Given different information C and C′ the probabilities P(A|C) and P(A|C′) will be different. On the other hand, the same precise rules for assigning and manipulating probabilities apply as before. Identical results should therefore be obtained whether these are applied by any person, or even a robot, so that part isn’t subjective at all.

In fact I’d go further. I think one of the great strengths of the Bayesian interpretation is precisely that it does depend on what information is assumed. This means that such information has to be stated explicitly. The essential assumptions behind a result can be – and, regrettably, often are – hidden in frequentist analyses. Being a Bayesian forces you to put all your cards on the table.

To a Bayesian, probabilities are always conditional on other assumed truths. There is no such thing as an absolute probability, hence my alteration of the form of Bayes’s theorem to represent this. A probability such as P(A) has no meaning to a Bayesian: there is always conditioning information. For example, if I blithely assign a probability of 1/6 to each face of a dice, that assignment is actually conditional on me having no information to discriminate between the appearance of the faces, and no knowledge of the rolling trajectory that would allow me to make a prediction of its eventual resting position.

In tbe Bayesian framework, probability theory becomes not a branch of experimental science but a branch of logic. Like any branch of mathematics it cannot be tested by experiment but only by the requirement that it be internally self-consistent. This brings me to what I think is one of the most important results of twentieth century mathematics, but which is unfortunately almost unknown in the scientific community. In 1946, Richard Cox derived the unique generalization of Boolean algebra under the assumption that such a logic must involve associated a single number with any logical proposition. The result he got is beautiful and anyone with any interest in science should make a point of reading his elegant argument. It turns out that the only way to construct a consistent logic of uncertainty incorporating this principle is by using the standard laws of probability. There is no other way to reason consistently in the face of uncertainty than probability theory. Accordingly, probability theory always applies when there is insufficient knowledge for deductive certainty. Probability is inductive logic.

This is not just a nice mathematical property. This kind of probability lies at the foundations of a consistent methodological framework that not only encapsulates many common-sense notions about how science works, but also puts at least some aspects of scientific reasoning on a rigorous quantitative footing. This is an important weapon that should be used more often in the battle against the creeping irrationalism one finds in society at large.

I posted some time ago about an alternative way of deriving the laws of probability from consistency arguments.

To see how the Bayesian approach works, let us consider a simple example. Suppose we have a hypothesis H (some theoretical idea that we think might explain some experiment or observation). We also have access to some data D, and we also adopt some prior information I (which might be the results of other experiments or simply working assumptions). What we want to know is how strongly the data D supports the hypothesis H given my background assumptions I. To keep it easy, we assume that the choice is between whether H is true or H is false. In the latter case, “not-H” or H′ (for short) is true. If our experiment is at all useful we can construct P(D|HI), the probability that the experiment would produce the data set D if both our hypothesis and the conditional information are true.

The probability P(D|HI) is called the likelihood; to construct it we need to have some knowledge of the statistical errors produced by our measurement. Using Bayes’ theorem we can “invert” this likelihood to give P(H|DI), the probability that our hypothesis is true given the data and our assumptions. The result looks just like we had in the first two equations:

$P(H|DI) = K^{-1}P(H|I)P(D|HI) .$

Now we can expand the “normalising constant” K because we know that either H or H′ must be true. Thus

$K=P(D|I)=P(H|I)P(D|HI)+P(H^{\prime}|I) P(D|H^{\prime}I)$

The P(H|DI) on the left-hand side of the first expression is called the posterior probability; the right-hand side involves P(H|I), which is called the prior probability and the likelihood P(D|HI). The principal controversy surrounding Bayesian inductive reasoning involves the prior and how to define it, which is something I’ll comment on in a future post.

The Bayesian recipe for testing a hypothesis assigns a large posterior probability to a hypothesis for which the product of the prior probability and the likelihood is large. It can be generalized to the case where we want to pick the best of a set of competing hypothesis, say H₁ …. H_n. Note that this need not be the set of all possible hypotheses, just those that we have thought about. We can only choose from what is available. The hypothesis may be relatively simple, such as that some particular parameter takes the value x, or they may be composite involving many parameters and/or assumptions. For instance, the Big Bang model of our universe is a very complicated hypothesis, or in fact a combination of hypotheses joined together, involving at least a dozen parameters which can’t be predicted a priori but which have to be estimated from observations.

The required result for multiple hypotheses is pretty straightforward: the sum of the two alternatives involved in K above simply becomes a sum over all possible hypotheses, so that

$P(H_i|DI) = K^{-1}P(H_i|I)P(D|H_iI),$

and

$K=P(D|I)=\sum P(H_j|I)P(D|H_jI)$

If the hypothesis concerns the value of a parameter – in cosmology this might be, e.g., the mean density of the Universe expressed by the density parameter Ω₀ – then the allowed space of possibilities is continuous. The sum in the denominator should then be replaced by an integral, but conceptually nothing changes. Our “best” hypothesis is the one that has the greatest posterior probability.

From a frequentist stance the procedure is often instead to just maximize the likelihood. According to this approach the best theory is the one that makes the data most probable. This can be the same as the most probable theory, but only if the prior probability is constant, but the probability of a model given the data is generally not the same as the probability of the data given the model. I’m amazed how many practising scientists make this error on a regular basis.

The following figure might serve to illustrate the difference between the frequentist and Bayesian approaches. In the former case, everything is done in “data space” using likelihoods, and in the other we work throughout with probabilities of hypotheses, i.e. we think in hypothesis space. I find it interesting to note that most theorists that I know who work in cosmology are Bayesians and most observers are frequentists!

As I mentioned above, it is the presence of the prior probability in the general formula that is the most controversial aspect of the Bayesian approach. The attitude of frequentists is often that this prior information is completely arbitrary or at least “model-dependent”. Being empirically-minded people, by and large, they prefer to think that measurements can be made and interpreted without reference to theory at all.

Assuming we can assign the prior probabilities in an appropriate way what emerges from the Bayesian framework is a consistent methodology for scientific progress. The scheme starts with the hardest part – theory creation. This requires human intervention, since we have no automatic procedure for dreaming up hypothesis from thin air. Once we have a set of hypotheses, we need data against which theories can be compared using their relative probabilities. The experimental testing of a theory can happen in many stages: the posterior probability obtained after one experiment can be fed in, as prior, into the next. The order of experiments does not matter. This all happens in an endless loop, as models are tested and refined by confrontation with experimental discoveries, and are forced to compete with new theoretical ideas. Often one particular theory emerges as most probable for a while, such as in particle physics where a “standard model” has been in existence for many years. But this does not make it absolutely right; it is just the best bet amongst the alternatives. Likewise, the Big Bang model does not represent the absolute truth, but is just the best available model in the face of the manifold relevant observations we now have concerning the Universe’s origin and evolution. The crucial point about this methodology is that it is inherently inductive: all the reasoning is carried out in “hypothesis space” rather than “observation space”. The primary form of logic involved is not deduction but induction. Science is all about inverse reasoning.

For comments on induction versus deduction in another context, see here.

So what are the main differences between the Bayesian and frequentist views?

First, I think it is fair to say that the Bayesian framework is enormously more general than is allowed by the frequentist notion that probabilities must be regarded as relative frequencies in some ensemble, whether that is real or imaginary. In the latter interpretation, a proposition is at once true in some elements of the ensemble and false in others. It seems to me to be a source of great confusion to substitute a logical AND for what is really a logical OR. The Bayesian stance is also free from problems associated with the failure to incorporate in the analysis any information that can’t be expressed as a frequency. Would you really trust a doctor who said that 75% of the people she saw with your symptoms required an operation, but who did not bother to look at your own medical files?

As I mentioned above, frequentists tend to talk about “random variables”. This takes us into another semantic minefield. What does “random” mean? To a Bayesian there are no random variables, only variables whose values we do not know. A random process is simply one about which we only have sufficient information to specify probability distributions rather than definite values.

More fundamentally, it is clear from the fact that the combination rules for probabilities were derived by Cox uniquely from the requirement of logical consistency, that any departure from these rules will generally speaking involve logical inconsistency. Many of the standard statistical data analysis techniques – including the simple “unbiased estimator” mentioned briefly above – used when the data consist of repeated samples of a variable having a definite but unknown value, are not equivalent to Bayesian reasoning. These methods can, of course, give good answers, but they can all be made to look completely silly by suitable choice of dataset.

By contrast, I am not aware of any example of a paradox or contradiction that has ever been found using the correct application of Bayesian methods, although method can be applied incorrectly. Furthermore, in order to deal with unique events like the weather, frequentists are forced to introduce the notion of an ensemble, a perhaps infinite collection of imaginary possibilities, to allow them to retain the notion that probability is a proportion. Provided the calculations are done correctly, the results of these calculations should agree with the Bayesian answers. On the other hand, frequentists often talk about the ensemble as if it were real, and I think that is very dangerous…

19 Comments »

Seeing Dark Matter..

Posted in The Universe and Stuff with tags Abel 1689, astronomy, Cosmology, dark matter, Gravitational Lensing on November 13, 2010 by telescoper

I found this intruiging and impressive image over at Cosmic Variance (there’s also a press release at the Hubble Space Telescope website with higher resolution images). It shows the giant cluster of galaxies Abell 1689 with, superimposed on it, a map of the matter distribution as reconstructed from the pattern of distortions of background galaxy images caused by gravitational lensing.

This picture confirms the existence of large amounts of dark matter in the cluster – the mass distribution causing lensing quite different from what you can see in the luminous matter – but it also poses a problem, in that the matter is much more concentrated in the centre of the cluster than current theoretical ideas seem to suggest it should be…

You can find the full paper here.

A New Theory of Dark Matter

Posted in Science Politics, The Universe and Stuff with tags astroparticle physics, bosons, Cosmology, dark matter, fermions, Keith Mason, Particle Physics on November 6, 2010 by telescoper

Since this week has seen the release of a number of interesting bits of news about particle physics and cosmology, I thought I’d take the chance to keep posting about science by way of a distraction from the interminable discussion of funding and related political issues. This time I thought I’d share some of my own theoretical work, which I firmly believe offers a viable alternative to current orthodox thinking in the realm of astroparticle physics.

As you probably know, one of the most important outstanding problems in this domain is to find an explanation of dark matter, a component of the matter distribution of the Universe which is inferred to exist from its effects on the growth of cosmic structures but which is yet to be detected by direct observations. We know that this dark matter can’t exist in the form of familiar atomic material (made of protons, neutrons and electrons) so it must comrpise some other form of matter. Many candidates exist, but the currently favoured model is that it is made of weakly interacting massive particles (WIMPs) arising in particle physics theories involving supersymmetry, perhaps the fermionic counterpart of the gauge bosons of the standard model, e.g. the photino (the supersymmetric counterpart of the photon).

However, extensive recent research has revealed that this standard explanation may in fact be incorrect and circumstantial evidence is mounting that supports a radically different scenario. I am now in a position to reveal the basics of a new theory that accounts for many recent observations in terms of an alternative hypothesis, which entails the existence of a brand new particle called the k-Mason.

Standard WIMP dark matter comprises very massive particles which move very slowly, hence the term Cold Dark Matter or CDM, for short. This means that CDM forms structures very rapidly and efficiently, in a hierarchical or “bottom-up” fashion. This idea is at the core of the standard “concordance” cosmological model.

However, the k-Mason is known to travel such huge distances at such high velocity in random directions between its (rare) encounters that it not only inhibits the self-organisation of other matter, but actively dissipates structures once they have been formed. All this means that structure formation is strongly suppressed and can only happen in a “top-down” manner, which is extremely inefficient as it can only form small-scale structures through the collapse of larger ones. Astronomers have compiled a huge amount of evidence of this effect in recent years, lending support to the existence of the k-Mason as a dominant influence (which is of course entirely at odds with the whole idea of concordance).

Other studies also provide pretty convincing quantitative evidence of the large mean free path of the k-Mason.

Although this new scenario does seem to account very naturally for the observational evidence of collapse and fragmentation gathered by UK astronomers since 2007, there are still many issues to be resolved before it can be developed into a fully testable theory. One difficulty is that the k-Mason appears to be surprisingly stable, whereas most theories suggest it would have vanished long before the present epoch. On the other hand, it has also been suggested that, rather than simply decaying, the k-Mason may instead transform into some other species with similar properties; suggestions for alternative candidates emerging from the decay of the k-Mason are actively being sought and it is hoped this process will be observed definitively within the next 18 months or so.

However the biggest problem facing this idea is the extreme difficulty of detecting the k-Mason at experimental or observational facilities. Some scientists have claimed evidence of its appearance at various laboratories run by the UK’s Science and Technology Facilities Council (STFC), as well as at the Large Hadron Collider at CERN, but these claims remain controversial: none has really stood up to detailed scrutiny and all lack independent confirmation from reliable witnesses. Likewise there is little proof of the presence of k-Mason at any ground-based astronomical observatory, which has led many astronomers to conclude that only observations done from space will remain viable in the longer term.

So, in summary, while the k-Mason remains a hypothetical entity, it does furnish a plausible theory that accounts, in a broad-brush sense, for many disparate phenomena. I urge particle physicists, astronomers and cosmologists to join forces in the hunt for this enigmatic object.

NOTE ADDED IN PROOF: The hypothetical “k-Mason” referred to in this article is not to be confused with the better-known “strange” particle the k-Meson.

18 Comments »

Finding Gravitational Lenses, the Herschel Way…

Posted in The Universe and Stuff with tags astronomy, Cardiff University, Cosmology, galaxies, gravitational lenses, H-ATLAS, Herschel Space Observatory on November 4, 2010 by telescoper

It’s nice to have the chance to blog for once about some exciting astrophysics rather than doom and gloom about budget cuts. Tomorrow (5th November) sees the publication of a long-awaited article (by Negrello et al.) in the journal Science (abstract here) that presents evidence of discovery of a number of new gravitational lens systems using the Herschel Space Observatory.

There is a press release accompanying this paper on the Cardiff University website, and a longer article on the Herschel Outreach website, from which I nicked the following nice graphic (click on it for a bigger version).

This shows rather nicely how a gravitational lens works: it’s basically a concentration of matter (in this case a galaxy) along the line of sight from the observer to a background source (in this case another galaxy). Light from the background object gets bent by the foreground object, forming multiple images which are usually both magnified and distorted. Gravitational lensing itself is not a new discovery but what is especially interesting about the new results are that they suggest a much more efficient way of finding lensed systems than we have previously had.

In the past they have usually been found by laboriously scouring optical (or sometimes radio) images of very faint galaxies. A candidate lens (perhaps a close-set group of images with similar colours), then this candidate is followed up with detailed spectroscopy to establish whether the images are actually all at the same redshift, which they should be if they are part of a lens system. Unfortunately, only about one-in-ten of candidate lens systems found this way turn out to be actual lenses, so this isn’t a very efficient way of finding them. Even multiple needles are hard to find in a haystack.

The new results have emerged from a large survey, called H-ATLAS, of galaxies detected in the far-infrared/submillimetre part of the spectrum. Even the preliminary stages of this survey covered a sufficiently large part of the sky – and sufficiently many galaxies within the region studied – to suggest the presence of a significant population of galaxies that bear all the hallmarks of being lensed.

The new Science article discusses five surprisingly bright objects found early on during the course of the H-ATLAS survey. The galaxies found with optical telescopes in the directions of these sources would not normally be expected to be bright at the far-infrared wavelengths observed by Herschel. This suggested that the galaxies seen in visible light might be gravitational lenses magnifying much more distant background galaxies seen by Herschel. With the relatively poor resolution that comes from working at long wavelengths, Herschel can’t resolve the individual images produced by the lens, but does collect more photons from a lensed galaxy than an unlensed one, so it appears much brighter in the detectors.

Detailed spectroscopic follow-up using ground-based radio and sub-millimetre telescopes confirmed these ideas : the galaxies seen by the optical telescopes are much closer, each ideally positioned to create gravitational lenses.

These results demonstrate that gravitational lensing is probably at work in all the distant and bright galaxies seen by Herschel. This in turn, suggests that in the full H-ATLAS survey might provide huge numbers of gravitational lens systems, enough to perform a number of powerful statistical tests of theories of galaxy formation and evolution. It’s a bit of a cliché to say so, but it looks like Herschel will indeed open up a new window on the distant Universe.

P.S. For the record, although I’m technically a member of the H-ATLAS consortium, I was not directly involved in this work and am not among the authors.

P.P.S. This announcement also gives me the opportunity to pass on the information that all the data arising from the H-ATLAS science demonstration phase is now available online for you to play with!

5 Comments »

There is no Zero

Posted in The Universe and Stuff with tags astroparticle physics, Cosmology, Particle Physics, Science Fiction, The Incredible Shrinking Man on October 1, 2010 by telescoper

The Incredible Shrinking Man is a science fiction film made in 1957. If you haven’t seen it before its title will probably make you think it’s a downmarket B-movie, but it’s far from that. In fact it was very well received by film critics when it was first released and in 2009 was added to the Library of Congress list of films considered to be culturally, historically or aesthetically significant. The special effects used to portray the main character reducing in size were remarkable in its day, but for me the film is worth it for the wonderful ending shown in the clip:

I first saw this film on TV when I was at school and the final monologue made such an impression on me that it keeps popping into my mind, as it just did. The field of astroparticle physics encompasses cosmology, the study of the Universe on the largest scales accessible to observation (many billions of light years) as well as the smallest dimensions we can probe using the techniques of particle physics. As the Incredible Shrinking Man realises, these are just two aspects of the same underlying unity. There’s nothing specifically new about this line of reasoning, however; I posted a poem a while ago that dates from 1675 which has a similar theme.

I decided to put the clip up now for two reasons. One is that the phrase “there is no zero” (which has passed me by on previous occasions I’ve watched the clip) reminds of some stuff I wrote recently for a book that I’m struggling to finish, about how there’s no such thing as nothing in physics. Space is much more than the absence of matter and even empty space isn’t the same thing as nothing at all. Zero is also just the flip side of infinity and I don’t think infinity exists in nature either. When infinity appears in our theories it’s just a flag to tell us we don’t know what we’re doing. Many others have thought this thought: both Gauss and, later, Hilbert argued against the possibility of there being realised infinities in nature. My old friend and erstwhile collaborator George Ellis adheres to this view too.

The other reason for posting it is that, in these days of the Incredible Shrinking Science Budget, it’s important that we recognize and nurture the deep connections between things by supporting science in all its forms. Once we start trying to unpick its strands, the web of knowledge will all too quickly unravel.

17 Comments »

In the Dark