Archive for nature

Coming of Age in a Low-Density Universe

Posted in Biographical, Open Access, The Universe and Stuff with tags , , , , on August 25, 2024 by telescoper

I was reminded just now that 30 years ago today, on 25th August 1994, this review article by myself and George Ellis was published in Nature (volume 370, pp. 609–615).

Sorry for the somewhat scrappy scanned copy. The article is still behind a paywall. No open access for the open Universe!

Can this really have been 30 years ago?

Anyway, that was the day I officially became labelled a “crank”, by some, although others thought we were pushing at an open door. We were arguing against the then-standard cosmological model (based on the Einstein – de Sitter model), but the weight of evidence was already starting to shift. Although we didn’t predict the arrival of dark energy, the arguments we presented about the density of matter did turn out to be correct. A lot has changed since 1994, but we continue to live in a Universe with a density of matter much lower than the critical density and our best estimate of what that density is was spot on.

Looking back on this, I think valuable lessons would be learned if someone had the time and energy to go through precisely why so many papers at that time were consistent with a higher-density Universe that we have now settled on. Confirmation bias undoubtedly played a role, and who is to say that it isn’t relevant to this day?

A Lamentation of Swans

Posted in Maynooth with tags , , , , , , , , on July 18, 2024 by telescoper

I was looking forward to making acquaintance with the beautiful swans of Maynooth when my sabbatical is over, but I’ve heard that recent tragic events mean that won’t be happening. The title of this post is not a reference to the collective noun for swans, but literally a lamentation.

A pair of swans had been nesting for several years on the little island in the harbour of the Royal Canal at Maynooth for several years- since before I arrived here. Every spring they have raised a batch of cygnets and these have grown up each summer and departed for a new life elsewhere. I’ve always enjoyed watching the little ones grow and learn how to find food under the very watchful eye of their parents. I had no reason to think things would be any different this year. I was wrong.

The first calamity to occur, earlier this year, was that the island flooded destroying this year’s batch of eggs. The swans tried again, and managed two more cygnets, but neither survived. I’m not sure exactly what happened but it seems, various locks were opened to allow water into the section of the canal England has a much more extensive system of waterways than Ireland and when rivers are close to flooding, water is often diverted into canals to stop them breaking their banks. I guess this is what happened earlier, but I don’t know.

It was bad enough that there are no cygnets this year, but worse was to come. Recently the female swan (pen) was found to be very ill. She was taken away by Kildare Wildlife Rescue (KWR) team and cared for but sadly passed away. All I know is that it seemed to be “an infection” which may or may not be the same cause of death as the cygnets. Avian flu is a possibility, so is some form of poisoning such as botulism. Sadly, people do feed the birds in the harbour with inappropriate things so this might also be a contributing factor.

What about the male swan (cob)? Well, he has gone. I don’t know whether he died too or whether he just left. Swans mate for life and I’ve heard of cases when one of a pair has simply pined away when the other has died.

So there are no swans nesting in Maynooth anymore. It’s really very sad. Swans are beautiful creatures and the pair on the canal was really well known to the community. I hope that another pair will nest on the island before too long. It may even be that a pair of rescue birds will be rehomed there by KWR. Before that happens though, I hope they find out why exactly caused the swans to die. We don’t want more deaths

Distant Things!

Posted in The Universe and Stuff with tags , , , on April 1, 2022 by telescoper

I’m a bit late passing this on but there was a great deal of excitement this week at the news that the Hubble Space Telescope (HST) has made an astonishing discovery about the early Universe as illustrated by the above picture published in Nature. As well as an individual star (?) observed at redshift 6.2, so distant that its light set out when the Universe was just 8% of its current age, the image also reveals the presence in the early Universe of large geometric shapes (such as rectangles) as well as a remarkable giant arrow. The presence of these features at such high redshift is completely inconsistent with the standard theory of structure formation.

Frequentism: the art of probably answering the wrong question

Posted in Bad Statistics with tags , , , , , , on September 15, 2014 by telescoper

Popped into the office for a spot of lunch in between induction events and discovered that Jon Butterworth has posted an item on his Grauniad blog about how particle physicists use statistics, and the ‘5σ rule’ that is usually employed as a criterion for the detection of, e.g. a new particle. I couldn’t resist bashing out a quick reply, because I believe that actually the fundamental issue is not whether you choose 3σ or 5σ or 27σ but what these statistics mean or don’t mean.

As was the case with a Nature piece I blogged about some time ago, Jon’s article focuses on the p-value, a frequentist concept that corresponds to the probability of obtaining a value at least as large as that obtained for a test statistic under a particular null hypothesis. To give an example, the null hypothesis might be that two variates are uncorrelated; the test statistic might be the sample correlation coefficient r obtained from a set of bivariate data. If the data were uncorrelated then r would have a known probability distribution, and if the value measured from the sample were such that its numerical value would be exceeded with a probability of 0.05 then the p-value (or significance level) is 0.05. This is usually called a ‘2σ’ result because for Gaussian statistics a variable has a probability of 95% of lying within 2σ of the mean value.

Anyway, whatever the null hypothesis happens to be, you can see that the way a frequentist would proceed would be to calculate what the distribution of measurements would be if it were true. If the actual measurement is deemed to be unlikely (say that it is so high that only 1% of measurements would turn out that large under the null hypothesis) then you reject the null, in this case with a “level of significance” of 1%. If you don’t reject it then you tacitly accept it unless and until another experiment does persuade you to shift your allegiance.

But the p-value merely specifies the probability that you would reject the null-hypothesis if it were correct. This is what you would call making a Type I error. It says nothing at all about the probability that the null hypothesis is actually a correct description of the data. To make that sort of statement you would need to specify an alternative distribution, calculate the distribution based on it, and hence determine the statistical power of the test, i.e. the probability that you would actually reject the null hypothesis when it is incorrect. To fail to reject the null hypothesis when it’s actually incorrect is to make a Type II error.

If all this stuff about p-values, significance, power and Type I and Type II errors seems a bit bizarre, I think that’s because it is. It’s so bizarre, in fact, that I think most people who quote p-values have absolutely no idea what they really mean. Jon’s piece demonstrates that he does, so this is not meant as a personal criticism, but it is a pervasive problem that results quoted in such a way are intrinsically confusing.

The Nature story mentioned above argues that in fact that results quoted with a p-value of 0.05 turn out to be wrong about 25% of the time. There are a number of reasons why this could be the case, including that the p-value is being calculated incorrectly, perhaps because some assumption or other turns out not to be true; a widespread example is assuming that the variates concerned are normally distributed. Unquestioning application of off-the-shelf statistical methods in inappropriate situations is a serious problem in many disciplines, but is particularly prevalent in the social sciences when samples are typically rather small.

While I agree with the Nature piece that there’s a problem, I don’t agree with the suggestion that it can be solved simply by choosing stricter criteria, i.e. a p-value of 0.005 rather than 0.05 or, in the case of particle physics, a 5σ standard (which translates to about 0.000001!  While it is true that this would throw out a lot of flaky ‘two-sigma’ results, it doesn’t alter the basic problem which is that the frequentist approach to hypothesis testing is intrinsically confusing compared to the logically clearer Bayesian approach. In particular, most of the time the p-value is an answer to a question which is quite different from that which a scientist would actually want to ask, which is what the data have to say about the probability of a specific hypothesis being true or sometimes whether the data imply one hypothesis more strongly than another. I’ve banged on about Bayesian methods quite enough on this blog so I won’t repeat the arguments here, except that such approaches focus on the probability of a hypothesis being right given the data, rather than on properties that the data might have given the hypothesis.

I feel so strongly about this that if I had my way I’d ban p-values altogether…

Not that it’s always easy to implement a Bayesian approach. It’s especially difficult when the data are affected by complicated noise statistics and selection effects, and/or when it is difficult to formulate a hypothesis test rigorously because one does not have a clear alternative hypothesis in mind. Experimentalists (including experimental particle physicists) seem to prefer to accept the limitations of the frequentist approach than tackle the admittedly very challenging problems of going Bayesian. In fact in my experience it seems that those scientists who approach data from a theoretical perspective are almost exclusively Baysian, while those of an experimental or observational bent stick to their frequentist guns.

Coincidentally a paper on the arXiv not long ago discussed an interesting apparent paradox in hypothesis testing that arises in the context of high energy physics, which I thought I’d share here. Here is the abstract:

The Jeffreys-Lindley paradox displays how the use of a p-value (or number of standard deviations z) in a frequentist hypothesis test can lead to inferences that are radically different from those of a Bayesian hypothesis test in the form advocated by Harold Jeffreys in the 1930’s and common today. The setting is the test of a point null (such as the Standard Model of elementary particle physics) versus a composite alternative (such as the Standard Model plus a new force of nature with unknown strength). The p-value, as well as the ratio of the likelihood under the null to the maximized likelihood under the alternative, can both strongly disfavor the null, while the Bayesian posterior probability for the null can be arbitrarily large. The professional statistics literature has many impassioned comments on the paradox, yet there is no consensus either on its relevance to scientific communication or on the correct resolution. I believe that the paradox is quite relevant to frontier research in high energy physics, where the model assumptions can evidently be quite different from those in other sciences. This paper is an attempt to explain the situation to both physicists and statisticians, in hopes that further progress can be made.

This paradox isn’t a paradox at all; the different approaches give different answers because they ask different questions. Both could be right, but I firmly believe that one of them answers the wrong question.

Boycott Nature and Science!

Posted in Open Access, Science Politics with tags , , , , on December 11, 2013 by telescoper

On Tuesday Randy Schekman, joint winner of the 2013 Nobel Prize for Physiology or Medicine hit out at academic publishers for the way the most “prestigious” journals (specifically Cell, Nature and Science) publish only the “flashiest” research.  I see his announcement as part of a groundswell of opinion that scientists are being increasingly pressured to worry more about the impact factors of the journals they publish in than about the actual science that they do. Cynics have been quick to point out that his statements have emerged only after he received the Nobel Prize, and that it’s difficult for younger researchers who have to build their careers in a world to break free from the metrics that are strangling many disciplines. I feel, as do some of my colleagues (such as Garret Cotter of Oxford University), that it’s time for established researchers to make a stand and turn away from those publishers that we feel are having a negative impact on science and instead go for alternative modes of publication that are in better keeping with the spirit of open science.

In future, therefore, I’ll be boycotting Nature and Science (I don’t publish in Cell anyway) and I call upon my colleagues to do likewise. Here’s a nice logo (courtesy of Garrett Cotter) that you might find useful should you wish to support the boycott.

CNS

ps. For the record I should point out that during my career I have published four papers in Nature and one in Science.

The Curse of P-values

Posted in Bad Statistics with tags , , , on November 12, 2013 by telescoper

Yesterday evening I noticed a news item in Nature that argues that inappropriate statistical methodology may be undermining the reporting of scientific results. The article focuses on lack of “reproducibility” of results.

The article focuses on the p-value, a frequentist concept that corresponds to the probability of obtaining a value at least as large as that obtained for a test statistic under the null hypothesis. To give an example, the null hypothesis might be that two variates are uncorrelated; the test statistic might be the sample correlation coefficient r obtained from a set of bivariate data. If the data were uncorrelated then r would have a known probability distribution, and if the value measured from the sample were such that its numerical value would be exceeded with a probability of 0.05 then the p-value (or significance level) is 0.05.

Anyway, whatever the null hypothesis happens to be, you can see that the way a frequentist would proceed would be to calculate what the distribution of measurements would be if it were true. If the actual measurement is deemed to be unlikely (say that it is so high that only 1% of measurements would turn out that big under the null hypothesis) then you reject the null, in this case with a “level of significance” of 1%. If you don’t reject it then you tacitly accept it unless and until another experiment does persuade you to shift your allegiance.

But the p-value merely specifies the probability that you would reject the null-hypothesis if it were correct. This is what you would call making a Type I error. It says nothing at all about the probability that the null hypothesis is actually a correct description of the data. To make that sort of statement you would need to specify an alternative distribution, calculate the distribution based on it, and hence determine the statistical power of the test, i.e. the probability that you would actually reject the null hypothesis when it is correct. To fail to reject the null hypothesis when it’s actually incorrect is to make a Type II error.

If all this stuff about p-values, significance, power and Type I and Type II errors seems a bit bizarre, I think that’s because it is. It’s so bizarre, in fact, that I think most people who quote p-values have absolutely no idea what they really mean.

The Nature story mentioned above argues that in fact that results quoted with a p-value of 0.05 turn out to be wrong about 25% of the time. There are a number of reasons why this could be the case, including that the p-value is being calculated incorrectly, perhaps because some assumption or other turns out not to be true; a widespread example is assuming that the variates concerned are normally distributed. Unquestioning application of off-the-shelf statistical methods in inappropriate situations is a serious problem in many disciplines, but is particularly prevalent in the social sciences when samples are typically rather small.

While I agree with the Nature piece that there’s a problem, I don’t agree with the suggestion that it can be solved simply by choosing stricter criteria, i.e. a p-value of 0.005 rather than 0.05. While it is true that this would throw out a lot of flaky `two-sigma’ results, it doesn’t alter the basic problem which is that the frequentist approach to hypothesis testing is intrinsically confusing compared to the logically clearer Bayesian approach. In particular, most of the time the p-value is an answer to a question which is quite different from that which a scientist would want to ask, which is what the data have to say about a given hypothesis. I’ve banged on about Bayesian methods quite enough on this blog so I won’t repeat the arguments here, except that such approaches focus on the probability of a hypothesis being right given the data, rather than on properties that the data might have given the hypothesis. If I had my way I’d ban p-values altogether.

Not that it’s always easy to implement a Bayesian approach. Coincidentally a recent paper on the arXiv discussed an interesting apparent paradox in hypothesis testing that arises in the context of high energy physics, which I thought I’d share here. Here is the abstract:

The Jeffreys-Lindley paradox displays how the use of a p-value (or number of standard deviations z) in a frequentist hypothesis test can lead to inferences that are radically different from those of a Bayesian hypothesis test in the form advocated by Harold Jeffreys in the 1930’s and common today. The setting is the test of a point null (such as the Standard Model of elementary particle physics) versus a composite alternative (such as the Standard Model plus a new force of nature with unknown strength). The p-value, as well as the ratio of the likelihood under the null to the maximized likelihood under the alternative, can both strongly disfavor the null, while the Bayesian posterior probability for the null can be arbitrarily large. The professional statistics literature has many impassioned comments on the paradox, yet there is no consensus either on its relevance to scientific communication or on the correct resolution. I believe that the paradox is quite relevant to frontier research in high energy physics, where the model assumptions can evidently be quite different from those in other sciences. This paper is an attempt to explain the situation to both physicists and statisticians, in hopes that further progress can be made.

Rather than tell you what I think about this paradox, I thought I’d invite discussion through the comments box…

A Grand Design Challenge

Posted in Astrohype, The Universe and Stuff with tags , , , , , on July 20, 2012 by telescoper

While I’m incarcerated at home I thought I might as well make myself useful by passing on an interesting news item I found on the BBC website. This relates to a paper in the latest edition of Nature that reports the discovery of what appears to be a classic “Grand Design” spiral galaxy at a redshift of 2.18. According to the standard big bang cosmology this means that the light we are seeing set out from this object over 10 billion years ago, so the object formed about 3 billion years after the big bang.

I found this image of the object – known to its friends as BX442 – and was blown away by it..

..until I saw the dreaded words “artist’s rendering”. The actual image is somewhat less impressive.

But what’s really interesting about the study reported in Nature are the questions it asks about how this object first into our understanding of spiral galaxy formation. According to the prevailing paradigm, galaxies form hierarchically by progressively merging smaller clumps into bigger ones. The general expectation is that at high redshift – corresponding to earlier stages of the formation process – galaxies are rather clumpy and disturbed; the spiral structure we see in nearby galaxies is rather flimsy and easily disturbed, so it’s quite surprising to see this one. Does BX442 live in an especially quiet environment? Have we seen few high-redshift spirals because they are rare, or because they are hard to find? Answers to these and other questions will only be found by doing systematic surveys to establish the frequency and distribution of objects like this, as well as the details of their internal kinematics.

Quite Interesting.