Archive for galaxy clustering

Two New Publications at the Open Journal of Astrophysics

Posted in OJAp Papers, Open Access, The Universe and Stuff with tags , , , , , , , , , , on February 24, 2024 by telescoper

It’s Saturday morning in Sydney, and time to post another update relating to the  Open Journal of Astrophysics.  Since the last update we have published two more papers, taking  the count in Volume 7 (2024) up to 15 and the total published by OJAp up to 130. I should have posted these before leaving but it slipped my mind.

The first paper of the most recent pair – published on  Thursday 22nd February – is “Modelling cross-correlations of ultra-high-energy cosmic rays and galaxies” by Federico Urban (Prague, Czech Republic), Stefano Camera (Torino, Italy) and David Alonso (Oxford, UK). It presents a discussion of the possible statistical correlations between Ultra-High-Energy Cosmic-Ray (UHECR) directions in various models and structure in the galaxy distribution and whether or not this signal could be measurable.  This one is in the folder marked “High-Energy Astrophysical Phenomena“.

Here is a screen grab of the overlay which includes the abstract:

You can click on the image of the overlay to make it larger should you wish to do so. You can find the officially accepted version of the paper on the arXiv here.

The second paper was published on Friday 23rd February and has the title “The IA Guide: A Breakdown of Intrinsic Alignment Formalisms” and the authors are: Claire Lamman (Harvard, USA);  Eleni Tsaprazi (Stockholm, Sweden);  Jingjing Shi (Tokyo, Japan); Nikolina Niko Šarčević (Newcastle, UK); Susan Pyne (UCL, UK); Elisa Legnani (Barcelona, Spain); and Tassia Ferreira (Oxford, UK). This one, which is in the folder marked Cosmology and NonGalactic Astrophysics, presents a review of Intrinsic Alignments, i.e. physical correlations involving galaxy shapes, galaxy spins, and larger scale structure, especially important for weak gravitational lensing

Here is a screen grab of the overlay which includes the abstract:

 

 

You can click on the image of the overlay to make it larger should you wish to do so. You can find the officially accepted version of the paper on the arXiv here.

That concludes this week’s update!

New Publication at the Open Journal of Astrophysics

Posted in OJAp Papers, The Universe and Stuff with tags , , , , , , , on April 25, 2023 by telescoper

It’s time once more to announce a new paper at the Open Journal of Astrophysics. The latest paper is the 13th paper so far in Volume 6 (2023) and the 78th in all. This one is another for the folder marked Cosmology and NonGalactic Astrophysics and its title is “The catalog-to-cosmology framework for weak lensing and galaxy clustering for LSST”.

The lead author is Judit Prat of the University of Chicago (Illinois, USA) and there are 21 co-authors from elsewhere in the USA and in the UK. The paper is written on behalf of the LSST Dark Energy Science Collaboration (LSST DESC), which is the international science collaboration that will make high accuracy measurements of fundamental cosmological parameters using data from the Rubin Observatory Legacy Survey of Space and Time (LSST). The OJAp has published a number of papers involving LSST DESC, and I’m very happy that such an important consortium has chosen to publish with us.

Here is a screen grab of the overlay which includes the  abstract:

You can click on the image of the overlay to make it larger should you wish to do so. You can find the officially accepted version of the paper on the arXiv here.

New Publication at the Open Journal of Astrophysics

Posted in OJAp Papers, Open Access, The Universe and Stuff with tags , , , , , , , , , on August 24, 2022 by telescoper

It’s time once again for me to announce another new paper at the Open Journal of Astrophysics. The new paper, published yesterday, is the 12th paper in Volume 5 (2022) and the 60th in all. The latest publication is entitled “Minkowski Functionals in Joint Galaxy Clustering & Weak Lensing Analyses” and the authors are Nisha Grewal, Joe Zuntz and Tilman Tröster of the Institute for Astronomy in Edinburgh and Alexandra Amon of the Institute of Astronomy in Cambridge. The paper is in the folder marked Cosmology and Non-Galactic Astrophysics.

Incidentally, Dr Alexandra Amon is the winner of this year’s Caroline Herschel Lectureship in Astronomy, so congratulations to her for that too!

The new paper is about the application of topological characteristics known as Minkowski Functionals to cosmological data. This approach has been used in the past to study the pattern cosmic microwave background temperature fluctuations; see e.g. here for one of my forays into this way back in 2008. Now there are more high-quality datasets besides the CMB so there are more opportunities to use this elegant approach. Perhaps I should do a blog post about Minkowski Functionals? Somewhat to my surprise I can’t find anything on that topic in my back catalogue here In The Dark

Anyway, here is a screen grab of the overlay which includes the  abstract:

 

 

You can click on the image to make it larger should you wish to do so. You can find the accepted version of the paper on the arXiv here.

GAA Clustering

Posted in Bad Statistics, GAA, The Universe and Stuff with tags , , , , , , on July 25, 2022 by telescoper
The distribution of GAA pitches in Ireland

The above picture was doing the rounds on Twitter yesterday ahead of this year’s All-Ireland Football Final at Croke Park (won by favourites Kerry despite a valiant effort from Galway, who led for much of the game and didn’t play at all like underdogs).

The picture above shows the distribution of Gaelic Athletics Association (GAA) grounds around Ireland. In case you didn’t know, Hurling and Gaelic Football are played on the same pitch with the same goals and markings on the field. First thing you notice is that the grounds are plentiful! Obviously the distribution is clustered around major population centres – Dublin, Cork, Limerick and Galway are particularly clear – but other than that the distribution is quite uniform, though in less populated areas the grounds tend to be less densely packed.

The eye is also drawn to filamentary features, probably related to major arterial roads. People need to be able to get to the grounds, after all. Or am I reading too much into these apparent structures? The eye is notoriously keen to see patterns where none really exist, a point I’ve made repeatedly on this blog in the context of galaxy clustering.

The statistical description of clustered point patterns is a fascinating subject, because it makes contact with the way in which our eyes and brain perceive pattern. I’ve spent a large part of my research career trying to figure out efficient ways of quantifying pattern in an objective way and I can tell you it’s not easy, especially when the data are prone to systematic errors and glitches. I can only touch on the subject here, but to see what I am talking about look at the two patterns below:

You will have to take my word for it that one of these is a realization of a two-dimensional Poisson point process and the other contains correlations between the points. One therefore has a real pattern to it, and one is a realization of a completely unstructured random process.

random or non-random?

I show this example in popular talks and get the audience to vote on which one is the random one. The vast majority usually think that the one on the right that  is random and the one on the left is the one with structure to it. It is not hard to see why. The right-hand pattern is very smooth (what one would naively expect for a constant probability of finding a point at any position in the two-dimensional space) , whereas the left-hand one seems to offer a profusion of linear, filamentary features and densely concentrated clusters.

In fact, it’s the picture on the left that was generated by a Poisson process using a  Monte Carlo random number generator. All the structure that is visually apparent is imposed by our own sensory apparatus, which has evolved to be so good at discerning patterns that it finds them when they’re not even there!

The right-hand process is also generated by a Monte Carlo technique, but the algorithm is more complicated. In this case the presence of a point at some location suppresses the probability of having other points in the vicinity. Each event has a zone of avoidance around it; the points are therefore anticorrelated. The result of this is that the pattern is much smoother than a truly random process should be. In fact, this simulation has nothing to do with galaxy clustering really. The algorithm used to generate it was meant to mimic the behaviour of glow-worms which tend to eat each other if they get  too close. That’s why they spread themselves out in space more uniformly than in the random pattern.

Incidentally, I got both pictures from Stephen Jay Gould’s collection of essays Bully for Brontosaurus and used them, with appropriate credit and copyright permission, in my own book From Cosmos to Chaos.

The tendency to find things that are not there is quite well known to astronomers. The constellations which we all recognize so easily are not physical associations of stars, but are just chance alignments on the sky of things at vastly different distances in space. That is not to say that they are random, but the pattern they form is not caused by direct correlations between the stars. Galaxies form real three-dimensional physical associations through their direct gravitational effect on one another.

People are actually pretty hopeless at understanding what “really” random processes look like, probably because the word random is used so often in very imprecise ways and they don’t know what it means in a specific context like this.  The point about random processes, even simpler ones like repeated tossing of a coin, is that coincidences happen much more frequently than one might suppose.

I suppose there is an evolutionary reason why our brains like to impose order on things in a general way. More specifically scientists often use perceived patterns in order to construct hypotheses. However these hypotheses must be tested objectively and often the initial impressions turn out to be figments of the imagination, like the canals on Mars.

R.I.P. Sir David Cox (1924-2022)

Posted in Biographical, mathematics, The Universe and Stuff with tags , , , , on January 21, 2022 by telescoper

I was saddened to hear a few days ago that the eminent statistician David Cox has passed away at the age of 97. I didn’t know Professor Cox personally – I met him only once, at a joint astronomy-statistics meeting at (I think) the Royal Astronomical Society back in the day – but I learnt a huge amount from books he co-wrote, despite the fact that he was of the frequentist persuasion. Three examples from my bookshelf are shown above.

I started my PhD DPhil in 1985 with virtually no formal study of statistics under my belt so I had to follow a steep learning curve and I was helped enormously by these books. I bought the book on Point Processes so as to understand some of the ideas being applied to galaxy clustering. It’s only a short book but it’s crammed with interesting ideas. Cox & Miller on Stochastic Processes is likewise a classic.

I know I’m not the only person in astrophysics whose career has been influenced by David Cox and I’m sure there are many other disciplines who have benefitted from his knowledge.

Among many other awards, David Cox was elected a Fellow of the Royal Society in 1973 and knighted in 1985.

Rest in peace Sir David Cox (1924-2022)

Cosmology with the Minimal Spanning Tree

Posted in The Universe and Stuff with tags , , , , , , on July 8, 2019 by telescoper

There’s a nice paper on the arXiv (by Naidoo et al) with the abstract:

The code mentioned at the end can be found here.

The appearance of this paper gives me an excuse to mention that I actually wrote a paper (with Russell Pearson) on the use of the Minimal (or Minimum) Spanning Tree (MST) to analyze galaxy clustering way back in 1995.

Here’s how we described the Minimal Spanning Tree in that old paper:

Strictly speaking , we used the Euclidean Minimum Spanning Tree in which the total length of the lines connecting a set of points in a tree is minimized. In general cases a weight can be assigned to each link that is not necessarily defined simply by the length. Here is visual illustration (which I think we drew by hand!)

You can think of the MST as a sort of pre-processing technique which accentuates linear features in a point process that might otherwise get lost in shot noise. Once one has a tree (pruned and/or separated as necessary) one can then extract various statistical properties in order to quantify the pattern present.

Way back in 1995 there were far fewer datasets available to which to apply this method and it didn’t catch on at the time. Now, with  ever-increasing availability of spectroscopic redshift surveys maybe its time has come at last! I look forward to playing with the Python code in due course!

 

Poisson (d’Avril) Point Processes

Posted in Uncategorized with tags , , , on April 2, 2019 by telescoper

I was very unimpressed by yesterday’s batch of April Fool jokes. Some of them were just too obvious:

I’m glad I didn’t try to do one.

Anyway, I noticed that an old post of mine was getting some traffic and when I investigated I found that some of the links to pictures were dead. So I’ve decided to refresh it and post again.

–0–

I’ve got a thing about randomness. For a start I don’t like the word, because it covers such a multitude of sins. People talk about there being randomness in nature when what they really mean is that they don’t know how to predict outcomes perfectly. That’s not quite the same thing as things being inherently unpredictable; statements about the nature of reality are ontological, whereas I think randomness is only a useful concept in an epistemological sense. It describes our lack of knowledge: just because we don’t know how to predict doesn’t mean that it can’t be predicted.

Nevertheless there are useful mathematical definitions of randomness and it is also (somtimes) useful to make mathematical models that display random behaviour in a well-defined sense, especially in situations where one has to take into account the effects of noise.

I thought it would be fun to illustrate one such model. In a point process, the random element is a “dot” that occurs at some location in time or space. Such processes occur in wide range of contexts: arrivals of buses at a bus stop, photons in a detector, darts on a dartboard, and so on.

Let us suppose that we think of such a process happening in time, although what follows can straightforwardly be generalised to things happening over an area (such a dartboard) or within some higher-dimensional region. It is also possible to invest the points with some other attributes; processes like this are sometimes called marked point processes, but I won’t discuss them here.

The “most” random way of constructing a simple point process is to assume that each event happens independently of every other event, and that there is a constant probability per unit time of an event happening. This type of process is called a Poisson process, after the French mathematician Siméon-Denis Poisson, who was born in 1781. He was one of the most creative and original physicists of all time: besides fundamental work on electrostatics and the theory of magnetism for which he is famous, he also built greatly upon Laplace’s work in probability theory. His principal result was to derive a formula giving the number of random events if the probability of each one is very low. The Poisson distribution, as it is now known and which I will come to shortly, is related to this original calculation; it was subsequently shown that this distribution amounts to a limiting of the binomial distribution. Just to add to the connections between probability theory and astronomy, it is worth mentioning that in 1833 Poisson wrote an important paper on the motion of the Moon.

In a finite interval of duration T the mean (or expected) number of events for a Poisson process will obviously just be proportional to the product of the rate per unit time and T itself; call this product λ.

The full distribution is then of the form:

This gives the probability that a finite interval contains exactly x events. It can be neatly derived from the binomial distribution by dividing the interval into a very large number of very tiny pieces, each one of which becomes a Bernoulli trial. The probability of success (i.e. of an event occurring) in each trial is extremely small, but the number of trials becomes extremely large in such a way that the mean number of successes is l. In this limit the binomial distribution takes the form of the above expression. The variance of this distribution is interesting: it is alsol.  This means that the typical fluctuations within the interval are of order the square root of l on a mean level of l, so the fractional variation is of the famous “one over root n” form that is a useful estimate of the expected variation in point processes.  Indeed, it’s a useful rule-of-thumb for estimating likely fluctuation levels in a host of statistical situations.

If football were a Poisson process with a mean number of goals per game of, say, 2 then would expect must games to have 2 plus or minus 1.4 (the square root of 2)  goals, i.e. between about 0.6 and 3.4. That is actually not far from what is observed and the distribution of goals per game in football matches is actually quite close to a Poisson distribution.

This idea can be straightforwardly extended to higher dimensional processes. If points are scattered over an area with a constant probability per unit area then the mean number in a finite area will also be some number l and the same formula applies.

As a matter of fact I first learned about the Poisson distribution when I was at school, doing A-level mathematics (which in those days actually included some mathematics). The example used by the teacher to illustrate this particular bit of probability theory was a two-dimensional one from biology. The skin of a fish was divided into little squares of equal area, and the number of parasites found in each square was counted. A histogram of these numbers accurately follows the Poisson form. For years I laboured under the delusion that it was given this name because it was something to do with fish, but then I never was very quick on the uptake.

This is all very well, but point processes are not always of this Poisson form. Points can be clustered, so that having one point at a given position increases the conditional probability of having others nearby. For example, galaxies like those shown in the nice picture are distributed throughout space in a clustered pattern that is very far from the Poisson form. But it’s very difficult to tell from just looking at the picture. What is needed is a rigorous statistical analysis.

 

The statistical description of clustered point patterns is a fascinating subject, because it makes contact with the way in which our eyes and brain perceive pattern. I’ve spent a large part of my research career trying to figure out efficient ways of quantifying pattern in an objective way and I can tell you it’s not easy, especially when the data are prone to systematic errors and glitches. I can only touch on the subject here, but to see what I am talking about look at the two patterns below:

pointbpointa

You will have to take my word for it that one of these is a realization of a two-dimensional Poisson point process and the other contains correlations between the points. One therefore has a real pattern to it, and one is a realization of a completely unstructured random process.

I show this example in popular talks and get the audience to vote on which one is the random one. The vast majority usually think that the top  is the one that is random and the bottom one is the one with structure to it. It is not hard to see why. The top pattern is very smooth (what one would naively expect for a constant probability of finding a point at any position in the two-dimensional space) , whereas the bottom one seems to offer a profusion of linear, filamentary features and densely concentrated clusters.

In fact, it’s the bottom  picture that was generated by a Poisson process using a  Monte Carlo random number generator. All the structure that is visually apparent is imposed by our own sensory apparatus, which has evolved to be so good at discerning patterns that it finds them when they’re not even there!

The top  process is also generated by a Monte Carlo technique, but the algorithm is more complicated. In this case the presence of a point at some location suppresses the probability of having other points in the vicinity. Each event has a zone of avoidance around it; the points are therefore anticorrelated. The result of this is that the pattern is much smoother than a truly random process should be. In fact, this simulation has nothing to do with galaxy clustering really. The algorithm used to generate it was meant to mimic the behaviour of glow-worms which tend to eat each other if they get  too close. That’s why they spread themselves out in space more uniformly than in the random pattern.

Incidentally, I got both pictures from Stephen Jay Gould’s collection of essays Bully for Brontosaurus and used them, with appropriate credit and copyright permission, in my own book From Cosmos to Chaos. I forgot to say this in earlier versions of this post.

The tendency to find things that are not there is quite well known to astronomers. The constellations which we all recognize so easily are not physical associations of stars, but are just chance alignments on the sky of things at vastly different distances in space. That is not to say that they are random, but the pattern they form is not caused by direct correlations between the stars. Galaxies form real three-dimensional physical associations through their direct gravitational effect on one another.

People are actually pretty hopeless at understanding what “really” random processes look like, probably because the word random is used so often in very imprecise ways and they don’t know what it means in a specific context like this.  The point about random processes, even simpler ones like repeated tossing of a coin, is that coincidences happen much more frequently than one might suppose.

I suppose there is an evolutionary reason why our brains like to impose order on things in a general way. More specifically scientists often use perceived patterns in order to construct hypotheses. However these hypotheses must be tested objectively and often the initial impressions turn out to be figments of the imagination, like the canals on Mars.

Now, I think I’ll complain to wordpress about the widget that links pages to a “random blog post”. I’m sure it’s not really random….

 

 

XXL Map of Galaxy Clusters

Posted in The Universe and Stuff with tags , on December 18, 2015 by telescoper

The press office at the European Space Agency is apparently determined to release as much interesting material as possible in the week before Christmas so that as few people as possible will notice. I mentioned one yesterday, and here is another.

XXL

The map is of preliminary data from the XXL Cluster Survey, the largest survey of galaxy clusters ever undertaken, and was obtained using the XMM-Newton telescope. (Thanks to various people, including Ben Maughan below who pointed out the error I made by relying on the accuracy of the ESA Press Release.)

The press-release marks the publication of the first results from this survey on 15th December 2015. The clusters of galaxies surveyed are prominent  features of the large-scale structure of the Universe and to better understand them is to better understand this structure and the circumstances that led to its evolution. So far 450 clusters have been identified – they are indicated by the red rings in the picture. Note that the full moon is shown at the top left to show the size of the sky area surveyed.

If you’ll pardon a touch of autobiography I should point out that my very first publication was on galaxy clusters. It came out in 1986 and was based on data from optically-selected clusters; X-rays emission from the very hot gas they contain is a much better way to identify these than through counting galaxies by their starlight. Cluster cosmology has moved on a lot. So has everything else in cosmology, come to think of it!

 

When random doesn’t seem random..

Posted in Crosswords, The Universe and Stuff with tags , , , , , on February 21, 2015 by telescoper

A few months have passed since I last won a dictionary as a prize in the Independent Crossword competition. That’s nothing remarkable in itself, but since my average rate of dictionary accumulation has been about one a month over the last few years, it seems a bit of a lull.  Have I forgotten how to do crosswords and keep sending in wrong solutions? Is the Royal Mail intercepting my post? Has the number of correct entries per week suddenly increased, reducing my odds of winning? Have the competition organizers turned against me?

In fact, statistically speaking, there’s nothing significant in this gap. Even if my grids are all correct, the number of correct grids has remained constant, and the winner is pulled at random  from those submitted (i.e. in such a way that all correct entries are equally likely to be drawn) , then a relatively long unsuccessful period such as I am experiencing at the moment is not at all improbable. The point is that such runs are far more likely in a truly random process than most people imagine, as indeed are runs of successes. Chance coincidences happen more often than you think.

I try this out in lectures sometimes, by asking a member of the audience to generate a random sequence of noughts and ones in their head. It seems people are very conscious that the number of ones should be roughly equal to the number of noughts that they impose that as they go along. Almost universally, the supposedly random sequences people produce only have very short runs of 1s or 0s because, say, a run like ‘00000’ just seems too unlikely. Well, it is unlikely, but that doesn’t mean it won’t happen. In a truly random binary sequence like this (i.e. one in which 1 and 0 both have a probability of 0.5 and each selection is independent of the others), coincidental runs of consecutive 0s and 1s happen with surprising frequency. Try it yourself, with a coin.

Coincidentally, the subject of randomness was suggested to me independently yesterday by an anonymous email correspondent by the name of John Peacock as I have blogged about it before; one particular post on this topic is actually one of this blog’s most popular articles).  What triggered this was a piece about music players such as Spotify (whatever that is) which have a “random play” feature. Apparently people don’t accept that it is “really random” because of the number of times the same track comes up. To deal with this “problem”, experts are working at algorithms that don’t actually play things randomly but in such a way that accords with what people think randomness means.

I think this fiddling is a very bad idea. People understand probability so poorly anyway that attempting to redefine the word’s meaning is just going to add confusion. You wouldn’t accept a casino that used loaded dice, so why allow cheating in another context? Far better for all concerned for the general public to understand what randomness is and, perhaps more importantly, what it looks like.

I have to confess that I don’t really like the word “randomness”, but I haven’t got time right now for a rant about it. There are, however, useful mathematical definitions of randomness and it is also (sometimes) useful to make mathematical models that display random behaviour in a well-defined sense, especially in situations where one has to take into account the effects of noise.

I thought it would be fun to illustrate one such model. In a point process, the random element is a “dot” that occurs at some location in time or space. Such processes can be defined in one or more dimensions and relate to a wide range of situations: arrivals of buses at a bus stop, photons in a detector, darts on a dartboard, and so on.

The statistical description of clustered point patterns is a fascinating subject, because it makes contact with the way in which our eyes and brain perceive pattern. I’ve spent a large part of my research career trying to figure out efficient ways of quantifying pattern in an objective way and I can tell you it’s not easy, especially when the data are prone to systematic errors and glitches. I can only touch on the subject here, but to see what I am talking about look at the two patterns below:

pointbpointa

You will have to take my word for it that one of these is a realization of a two-dimensional Poisson point process and the other contains correlations between the points. One therefore has a real pattern to it, and one is a realization of a completely unstructured random process.

I show this example in popular talks and get the audience to vote on which one is the random one. In fact, I did this just a few weeks ago during a lecture in our module Quarks to Cosmos, which attempts to explain scientific concepts to non-science students. As usual when I do this, I found that the vast majority thought  that the top one is random and the bottom one is the one with structure to it. It is not hard to see why. The top pattern is very smooth (what one would naively expect for a constant probability of finding a point at any position in the two-dimensional space) , whereas the bottom one seems to offer a profusion of linear, filamentary features and densely concentrated clusters.

In fact, it’s the bottom  picture that was generated by a Poisson process using a  Monte Carlo random number generator. All the structure that is visually apparent in the second example is imposed by our own sensory apparatus, which has evolved to be so good at discerning patterns that it finds them when they’re not even there!

The top  process is also generated by a Monte Carlo technique, but the algorithm is more complicated. In this case the presence of a point at some location suppresses the probability of having other points in the vicinity. Each event has a zone of avoidance around it; the points are therefore anticorrelated. The result of this is that the pattern is much smoother than a truly random process should be. In fact, this simulation has nothing to do with galaxy clustering really. The algorithm used to generate it was meant to mimic the behaviour of glow-worms which tend to eat each other if they get  too close. That’s why they spread themselves out in space more uniformly than in the “really” random pattern.

I assume that Spotify’s non-random play algorithm will have the effect of producing a one-dimensional version of the top pattern, i.e. one with far too few coincidences to be genuinely random.

Incidentally, I got both pictures from Stephen Jay Gould’s collection of essays Bully for Brontosaurus and used them, with appropriate credit and copyright permission, in my own book From Cosmos to Chaos.

The tendency to find things that are not there is quite well known to astronomers. The constellations which we all recognize so easily are not physical associations of stars, but are just chance alignments on the sky of things at vastly different distances in space. That is not to say that they are random, but the pattern they form is not caused by direct correlations between the stars. Galaxies form real three-dimensional physical associations through their direct gravitational effect on one another.

People are actually pretty hopeless at understanding what “really” random processes look like, probably because the word random is used so often in very imprecise ways and they don’t know what it means in a specific context like this.  The point about random processes, even simpler ones like repeated tossing of a coin, is that coincidences happen much more frequently than one might suppose.

I suppose there is an evolutionary reason why our brains like to impose order on things in a general way. More specifically scientists often use perceived patterns in order to construct hypotheses. However these hypotheses must be tested objectively and often the initial impressions turn out to be figments of the imagination, like the canals on Mars.

Perhaps I should complain to WordPress about the widget that links pages to a “random blog post”. I’m sure it’s not really random….

Galaxies, Glow-worms and Chicken Eyes

Posted in Bad Statistics, The Universe and Stuff with tags , , , , , , , , on February 26, 2014 by telescoper

I just came across a news item based on a research article in Physical Review E by Jiao et al. with the abstract:

Optimal spatial sampling of light rigorously requires that identical photoreceptors be arranged in perfectly regular arrays in two dimensions. Examples of such perfect arrays in nature include the compound eyes of insects and the nearly crystalline photoreceptor patterns of some fish and reptiles. Birds are highly visual animals with five different cone photoreceptor subtypes, yet their photoreceptor patterns are not perfectly regular. By analyzing the chicken cone photoreceptor system consisting of five different cell types using a variety of sensitive microstructural descriptors, we find that the disordered photoreceptor patterns are “hyperuniform” (exhibiting vanishing infinite-wavelength density fluctuations), a property that had heretofore been identified in a unique subset of physical systems, but had never been observed in any living organism. Remarkably, the patterns of both the total population and the individual cell types are simultaneously hyperuniform. We term such patterns “multihyperuniform” because multiple distinct subsets of the overall point pattern are themselves hyperuniform. We have devised a unique multiscale cell packing model in two dimensions that suggests that photoreceptor types interact with both short- and long-ranged repulsive forces and that the resultant competition between the types gives rise to the aforementioned singular spatial features characterizing the system, including multihyperuniformity. These findings suggest that a disordered hyperuniform pattern may represent the most uniform sampling arrangement attainable in the avian system, given intrinsic packing constraints within the photoreceptor epithelium. In addition, they show how fundamental physical constraints can change the course of a biological optimization process. Our results suggest that multihyperuniform disordered structures have implications for the design of materials with novel physical properties and therefore may represent a fruitful area for future research.

The point made in the paper is that the photoreceptors found in the eyes of chickens possess a property called disordered hyperuniformity which means that the appear disordered on small scales but exhibit order over large distances. Here’s an illustration:

chicken_eyes

It’s an interesting paper, but I’d like to quibble about something it says in the accompanying news story. The caption with the above diagram states

Left: visual cell distribution in chickens; right: a computer-simulation model showing pretty much the exact same thing. The colored dots represent the centers of the chicken’s eye cells.

Well, as someone who has spent much of his research career trying to discern and quantify patterns in collections of points – in my case they tend to be galaxies rather than photoreceptors – I find it difficult to defend the use of the phrase “pretty much the exact same thing”. It’s notoriously difficult to look at realizations of stochastic point processes and decided whether they are statistically similar or not. For that you generally need quite sophisticated mathematical analysis.  In fact, to my eye, the two images above don’t look at all like “pretty much the exact same thing”. I’m not at all sure that the model works as well as it is claimed, as the statistical analysis presented in the paper is relatively simple: I’d need to see some more quantitative measures of pattern morphology and clustering, especially higher-order correlation functions, before I’m convinced.

Anyway, all this reminded me of a very old post of mine about the difficulty of discerning patterns in distributions of points. Take the two (not very well scanned)  images here as examples:

points

You will have to take my word for it that one of these is a realization of a two-dimensional Poisson point process (which is, in a well-defined sense completely “random”) and the other contains spatial correlations between the points. One therefore has a real pattern to it, and one is a realization of a completely unstructured random process.

I sometimes show this example in popular talks and get the audience to vote on which one is the random one. The vast majority usually think that the one on the right is the one that is random and the left one is the one with structure to it. It is not hard to see why. The right-hand pattern is very smooth (what one would naively expect for a constant probability of finding a point at any position in the two-dimensional space) , whereas the  left one seems to offer a profusion of linear, filamentary features and densely concentrated clusters.

In fact, it’s the left picture that was generated by a Poisson process using a Monte Carlo random number generator. All the structure that is visually apparent is imposed by our own sensory apparatus, which has evolved to be so good at discerning patterns that it finds them when they’re not even there!

The right process is also generated by a Monte Carlo technique, but the algorithm is more complicated. In this case the presence of a point at some location suppresses the probability of having other points in the vicinity. Each event has a zone of avoidance around it; the points are therefore anticorrelated. The result of this is that the pattern is much smoother than a truly random process should be. In fact, this simulation has nothing to do with galaxy clustering really. The algorithm used to generate it was meant to mimic the behaviour of glow-worms (a kind of beetle) which tend to eat each other if they get too close. That’s why they spread themselves out in space more uniformly than in the random pattern. In fact, the tendency displayed in this image of the points to spread themselves out more smoothly than a random distribution is in in some ways reminiscent of the chicken eye problem.

The moral of all this is that people are actually pretty hopeless at understanding what “really” random processes look like, probably because the word random is used so often in very imprecise ways and they don’t know what it means in a specific context like this. The point about random processes, even simpler ones like repeated tossing of a coin, is that coincidences happen much more frequently than one might suppose. By the same token, people are also pretty hopeless at figuring out whether two distributions of points resemble each other in some kind of statistical sense, because that can only be made precise if one defines some specific quantitative measure of clustering pattern, which is not easy to do.