Archive for the Bad Statistics Category

The Curse of P-values

Posted in Bad Statistics with tags , , , on November 12, 2013 by telescoper

Yesterday evening I noticed a news item in Nature that argues that inappropriate statistical methodology may be undermining the reporting of scientific results. The article focuses on lack of “reproducibility” of results.

The article focuses on the p-value, a frequentist concept that corresponds to the probability of obtaining a value at least as large as that obtained for a test statistic under the null hypothesis. To give an example, the null hypothesis might be that two variates are uncorrelated; the test statistic might be the sample correlation coefficient r obtained from a set of bivariate data. If the data were uncorrelated then r would have a known probability distribution, and if the value measured from the sample were such that its numerical value would be exceeded with a probability of 0.05 then the p-value (or significance level) is 0.05.

Anyway, whatever the null hypothesis happens to be, you can see that the way a frequentist would proceed would be to calculate what the distribution of measurements would be if it were true. If the actual measurement is deemed to be unlikely (say that it is so high that only 1% of measurements would turn out that big under the null hypothesis) then you reject the null, in this case with a “level of significance” of 1%. If you don’t reject it then you tacitly accept it unless and until another experiment does persuade you to shift your allegiance.

But the p-value merely specifies the probability that you would reject the null-hypothesis if it were correct. This is what you would call making a Type I error. It says nothing at all about the probability that the null hypothesis is actually a correct description of the data. To make that sort of statement you would need to specify an alternative distribution, calculate the distribution based on it, and hence determine the statistical power of the test, i.e. the probability that you would actually reject the null hypothesis when it is correct. To fail to reject the null hypothesis when it’s actually incorrect is to make a Type II error.

If all this stuff about p-values, significance, power and Type I and Type II errors seems a bit bizarre, I think that’s because it is. It’s so bizarre, in fact, that I think most people who quote p-values have absolutely no idea what they really mean.

The Nature story mentioned above argues that in fact that results quoted with a p-value of 0.05 turn out to be wrong about 25% of the time. There are a number of reasons why this could be the case, including that the p-value is being calculated incorrectly, perhaps because some assumption or other turns out not to be true; a widespread example is assuming that the variates concerned are normally distributed. Unquestioning application of off-the-shelf statistical methods in inappropriate situations is a serious problem in many disciplines, but is particularly prevalent in the social sciences when samples are typically rather small.

While I agree with the Nature piece that there’s a problem, I don’t agree with the suggestion that it can be solved simply by choosing stricter criteria, i.e. a p-value of 0.005 rather than 0.05. While it is true that this would throw out a lot of flaky `two-sigma’ results, it doesn’t alter the basic problem which is that the frequentist approach to hypothesis testing is intrinsically confusing compared to the logically clearer Bayesian approach. In particular, most of the time the p-value is an answer to a question which is quite different from that which a scientist would want to ask, which is what the data have to say about a given hypothesis. I’ve banged on about Bayesian methods quite enough on this blog so I won’t repeat the arguments here, except that such approaches focus on the probability of a hypothesis being right given the data, rather than on properties that the data might have given the hypothesis. If I had my way I’d ban p-values altogether.

Not that it’s always easy to implement a Bayesian approach. Coincidentally a recent paper on the arXiv discussed an interesting apparent paradox in hypothesis testing that arises in the context of high energy physics, which I thought I’d share here. Here is the abstract:

The Jeffreys-Lindley paradox displays how the use of a p-value (or number of standard deviations z) in a frequentist hypothesis test can lead to inferences that are radically different from those of a Bayesian hypothesis test in the form advocated by Harold Jeffreys in the 1930’s and common today. The setting is the test of a point null (such as the Standard Model of elementary particle physics) versus a composite alternative (such as the Standard Model plus a new force of nature with unknown strength). The p-value, as well as the ratio of the likelihood under the null to the maximized likelihood under the alternative, can both strongly disfavor the null, while the Bayesian posterior probability for the null can be arbitrarily large. The professional statistics literature has many impassioned comments on the paradox, yet there is no consensus either on its relevance to scientific communication or on the correct resolution. I believe that the paradox is quite relevant to frontier research in high energy physics, where the model assumptions can evidently be quite different from those in other sciences. This paper is an attempt to explain the situation to both physicists and statisticians, in hopes that further progress can be made.

Rather than tell you what I think about this paradox, I thought I’d invite discussion through the comments box…

Tension in Cosmology?

Posted in Astrohype, Bad Statistics, The Universe and Stuff with tags , , , on October 24, 2013 by telescoper

I noticed this abstract (of a paper by Rest et al.) on the arXiv the other day:

We present griz light curves of 146 spectroscopically confirmed Type Ia Supernovae (0.03<z<0.65) discovered during the first 1.5 years of the Pan-STARRS1 Medium Deep Survey. The Pan-STARRS1 natural photometric system is determined by a combination of on-site measurements of the instrument response function and observations of spectrophotometric standard stars. We have investigated spatial and time variations in the photometry, and we find that the systematic uncertainties in the photometric system are currently 1.2% without accounting for the uncertainty in the HST Calspec definition of the AB system. We discuss our efforts to minimize the systematic uncertainties in the photometry. A Hubble diagram is constructed with a subset of 112 SNe Ia (out of the 146) that pass our light curve quality cuts. The cosmological fit to 313 SNe Ia (112 PS1 SNe Ia + 201 low-z SNe Ia), using only SNe and assuming a constant dark energy equation of state and flatness, yields w = -1.015^{+0.319}_{-0.201}(Stat)+{0.164}_{-0.122}(Sys). When combined with BAO+CMB(Planck)+H0, the analysis yields \Omega_M = 0.277^{+0.010}_{-0.012} and w = -1.186^{+0.076}_{-0.065} including all identified systematics, as spelled out in the companion paper by Scolnic et al. (2013a). The value of w is inconsistent with the cosmological constant value of -1 at the 2.4 sigma level. This tension has been seen in other high-z SN surveys and endures after removing either the BAO or the H0 constraint. If we include WMAP9 CMB constraints instead of those from Planck, we find w = -1.142^{+0.076}_{-0.087}, which diminishes the discord to <2 sigma. We cannot conclude whether the tension with flat CDM is a feature of dark energy, new physics, or a combination of chance and systematic errors. The full Pan-STARRS1 supernova sample will be 3 times as large as this initial sample, which should provide more conclusive results.

The mysterious Pan-STARRS stands for the Panoramic Survey Telescope and Rapid Response System, a set of telescopes cameras and related computing hardware that monitors the sky from its base in Hawaii. One of the many things this system can do is detect and measure distant supernovae, hence the particular application to cosmology described in the paper. The abstract mentions a preliminary measurement of the parameter w, which for those of you who are not experts in cosmology is usually called the “equation of state” parameter for the dark energy component involved in the standard model. What it describes is the relationship between the pressure P and the energy density ρc2 of this mysterious stuff, via the relation P=wρc2. The particularly interesting case is w=-1 which corresponds to a cosmological constant term; see here for a technical discussion. However, we don’t know how to explain this dark energy from first principles so really w is a parameter that describes our ignorance of what is actually going on. In other words, the cosmological constant provides the simplest model of dark energy but even in that case we don’t know where it comes from so it might well be something different; estimating w from surveys can therefore tell us whether we’re on the right track or not.

The abstract explains that, within the errors, the Pan-STARRS data on their own are consistent with w=-1. More interestingly, though, combining the supernovae observations with others, the best-fit value of w shifts towards a value a bit less than -1 (although still with quite a large uncertainty). Incidentally  value of w less than -1 is generally described as a “phantom” dark energy component. I’ve never really understood why…

So far estimates of cosmological parameters from different data sets have broadly agreed with each other, hence the application of the word “concordance” to the standard cosmological model.  However, it does seem to be the case that supernova measurements do generally seem to push cosmological parameter estimates away from the comfort zone established by other types of observation. Could this apparent discordance be signalling that our ideas are wrong?

That’s the line pursued by a Scientific American article on this paper entitled “Leading Dark Energy Theory Incompatible with New Measurement”. This could be true, but I think it’s a bit early to be taking this line when there are still questions to be answered about the photometric accuracy of the Pan-Starrs survey. The headline I would have picked would be more like “New Measurement (Possibly) Incompatible With Other Measurements of Dark Energy”.

But that would have been boring…

Australia: Cyclones go up to Eleven!

Posted in Bad Statistics with tags , , , , , , , on October 14, 2013 by telescoper

I saw a story on the web this morning which points out that Australians can expect 11 cyclones this season.

It’s not a very good headline, because it’s a bit misleading about what the word “expected” means. In fact the number eleven is the average number of cyclones, which is not necessarily the number expected, despite the fact that “expected value” or “expectation value” . If you don’t understand this criticism, ask yourself how many legs you’d expect a randomly-chosen person to have. You’d probably settle on the answer “two”, but that is the most probable number, i.e. the mode, which in this case exceeds the average. If one person in a thousand has only one leg then a group of a thousand has 1999 legs between them, so the average (or arithmetic mean) is 1.999. Most people therefore have more than the average number of legs…

I’ve always found it quite annoying that physicists use the term “expectation value” to mean “average” because it implies that the average is the value you would expect. In the example given above you wouldn’t expect a person to have the average number of legs – if you assume that the actual number is an integer, it’s actually impossible to find a person with 1.999! In other words, the probability of finding someone in that group with the average number of legs in the group is exactly zero.

The same confusion happens when newspapers talk about the “average wage” which is considerably higher than the wage most people receive.

In any case the point is that there is undoubtedly a considerable uncertainty in the prediction of eleven cyclones per season, and one would like to have some idea how large an error bar is associated with that value.

Anyway, statistical pedantry notwithstanding, it is indeed impressive that the number of cyclones in a season goes all the way up to eleven..

Science, Religion and Henry Gee

Posted in Bad Statistics, Books, Talks and Reviews, Science Politics, The Universe and Stuff with tags , , , , , , , , , on September 23, 2013 by telescoper

Last week a piece appeared on the Grauniad website by Henry Gee who is a Senior Editor at the magazine Nature.  I was prepared to get a bit snarky about the article when I saw the title, as it reminded me of an old  rant about science being just a kind of religion by Simon Jenkins that got me quite annoyed a few years ago. Henry Gee’s article, however, is actually rather more coherent than that and  not really deserving of some of the invective being flung at it.

For example, here’s an excerpt that I almost agree with:

One thing that never gets emphasised enough in science, or in schools, or anywhere else, is that no matter how fancy-schmancy your statistical technique, the output is always a probability level (a P-value), the “significance” of which is left for you to judge – based on nothing more concrete or substantive than a feeling, based on the imponderables of personal or shared experience. Statistics, and therefore science, can only advise on probability – they cannot determine The Truth. And Truth, with a capital T, is forever just beyond one’s grasp.

I’ve made the point on this blog many times that, although statistical reasoning lies at the heart of the scientific method, we don’t do anywhere near enough  to teach students how to use probability properly; nor do scientists do enough to explain the uncertainties in their results to decision makers and the general public.  I also agree with the concluding thought, that science isn’t about absolute truths. Unfortunately, Gee undermines his credibility by equating statistical reasoning with p-values which, in my opinion, are a frequentist aberration that contributes greatly to the public misunderstanding of science. Worse, he even gets the wrong statistics wrong…

But the main thing that bothers me about Gee’s article is that he blames scientists for promulgating the myth of “science-as-religion”. I don’t think that’s fair at all. Most scientists I know are perfectly well aware of the limitations of what they do. It’s really the media that want to portray everything in simple black and white terms. Some scientists play along, of course, as I comment upon below, but most of us are not priests but pragmatatists.

Anyway, this episode gives me the excuse to point out  that I ended a book I wrote in 1998 with a discussion of the image of science as a kind of priesthood which it seems apt to repeat here. The book was about the famous eclipse expedition of 1919 that provided some degree of experimental confirmation of Einstein’s general theory of relativity and which I blogged about at some length last year, on its 90th anniversary.

I decided to post the last few paragraphs here to show that I do think there is a valuable point to be made out of the scientist-as-priest idea. It’s to do with the responsibility scientists have to be honest about the limitations of their research and the uncertainties that surround any new discovery. Science has done great things for humanity, but it is fallible. Too many scientists are too certain about things that are far from proven. This can be damaging to science itself, as well as to the public perception of it. Bandwagons proliferate, stifling original ideas and leading to the construction of self-serving cartels. This is a fertile environment for conspiracy theories to flourish.

To my mind the thing  that really separates science from religion is that science is an investigative process, not a collection of truths. Each answer simply opens up more questions.  The public tends to see science as a collection of “facts” rather than a process of investigation. The scientific method has taught us a great deal about the way our Universe works, not through the exercise of blind faith but through the painstaking interplay of theory, experiment and observation.

This is what I wrote in 1998:

Science does not deal with ‘rights’ and ‘wrongs’. It deals instead with descriptions of reality that are either ‘useful’ or ‘not useful’. Newton’s theory of gravity was not shown to be ‘wrong’ by the eclipse expedition. It was merely shown that there were some phenomena it could not describe, and for which a more sophisticated theory was required. But Newton’s theory still yields perfectly reliable predictions in many situations, including, for example, the timing of total solar eclipses. When a theory is shown to be useful in a wide range of situations, it becomes part of our standard model of the world. But this doesn’t make it true, because we will never know whether future experiments may supersede it. It may well be the case that physical situations will be found where general relativity is supplanted by another theory of gravity. Indeed, physicists already know that Einstein’s theory breaks down when matter is so dense that quantum effects become important. Einstein himself realised that this would probably happen to his theory.

Putting together the material for this book, I was struck by the many parallels between the events of 1919 and coverage of similar topics in the newspapers of 1999. One of the hot topics for the media in January 1999, for example, has been the discovery by an international team of astronomers that distant exploding stars called supernovae are much fainter than had been predicted. To cut a long story short, this means that these objects are thought to be much further away than expected. The inference then is that not only is the Universe expanding, but it is doing so at a faster and faster rate as time passes. In other words, the Universe is accelerating. The only way that modern theories can account for this acceleration is to suggest that there is an additional source of energy pervading the very vacuum of space. These observations therefore hold profound implications for fundamental physics.

As always seems to be the case, the press present these observations as bald facts. As an astrophysicist, I know very well that they are far from unchallenged by the astronomical community. Lively debates about these results occur regularly at scientific meetings, and their status is far from established. In fact, only a year or two ago, precisely the same team was arguing for exactly the opposite conclusion based on their earlier data. But the media don’t seem to like representing science the way it actually is, as an arena in which ideas are vigorously debated and each result is presented with caveats and careful analysis of possible error. They prefer instead to portray scientists as priests, laying down the law without equivocation. The more esoteric the theory, the further it is beyond the grasp of the non-specialist, the more exalted is the priest. It is not that the public want to know – they want not to know but to believe.

Things seem to have been the same in 1919. Although the results from Sobral and Principe had then not received independent confirmation from other experiments, just as the new supernova experiments have not, they were still presented to the public at large as being definitive proof of something very profound. That the eclipse measurements later received confirmation is not the point. This kind of reporting can elevate scientists, at least temporarily, to the priesthood, but does nothing to bridge the ever-widening gap between what scientists do and what the public think they do.

As we enter a new Millennium, science continues to expand into areas still further beyond the comprehension of the general public. Particle physicists want to understand the structure of matter on tinier and tinier scales of length and time. Astronomers want to know how stars, galaxies  and life itself came into being. But not only is the theoretical ambition of science getting bigger. Experimental tests of modern particle theories require methods capable of probing objects a tiny fraction of the size of the nucleus of an atom. With devices such as the Hubble Space Telescope, astronomers can gather light that comes from sources so distant that it has taken most of the age of the Universe to reach us from them. But extending these experimental methods still further will require yet more money to be spent. At the same time that science reaches further and further beyond the general public, the more it relies on their taxes.

Many modern scientists themselves play a dangerous game with the truth, pushing their results one-sidedly into the media as part of the cut-throat battle for a share of scarce research funding. There may be short-term rewards, in grants and TV appearances, but in the long run the impact on the relationship between science and society can only be bad. The public responded to Einstein with unqualified admiration, but Big Science later gave the world nuclear weapons. The distorted image of scientist-as-priest is likely to lead only to alienation and further loss of public respect. Science is not a religion, and should not pretend to be one.

PS. You will note that I was voicing doubts about the interpretation of the early results from supernovae  in 1998 that suggested the universe might be accelerating and that dark energy might be the reason for its behaviour. Although more evidence supporting this interpretation has since emerged from WMAP and other sources, I remain sceptical that we cosmologists are on the right track about this. Don’t get me wrong – I think the standard cosmological model is the best working hypothesis we have _ I just think we’re probably missing some important pieces of the puzzle. I don’t apologise for that. I think sceptical is what a scientist should be.

Physics and Statistics

Posted in Bad Statistics, Education with tags , , , on August 16, 2013 by telescoper

Predictably, yesterday’s newspapers and other media  were full of feeble articles about the A-level results, and I don’t just mean the gratuitous pictures of pretty girls opening envelopes and/or jumping in the air.  I’ve never met a journalist who understood the concept of statistical significance, which seems to account for the way they feel able to write whatever they like about any numbers that happen to be newsworthy without feeling constrained by mathematical common-sense.  Sometimes it’s the ridiculous over-interpretation of opinion polls (which usually have a sampling uncertainty of ±3 %), sometimes its league tables. This time it’s the number of students getting the top grades at A-level.

The BBC, for example, made a lot of fuss about the fall in the % of A and A* A-level grades, to  26.3% this year from 26.6% last year. Anyone with a modicum of statistical knowledge would know, however, that whether this drop means anything at all depends on how many results were involved: the sampling uncertainty depends on size N approximately as √N. For a cohort of 300000 this turns into a percentage uncertainty of about 0.57%, which is about twice as large as the reported fall.  The result is therefore “in the noise” – in the sense that there’s no evidence that it was actually harder to get a high grade this year compared with last year – but that didn’t prove a barrier to those editors intent on filling their newspapers and websites with meaningless guff.

Almost hidden among the bilge was an interesting snippet about Physics. It seems that the number of students taking Physics A-level this year has exceeded 35,000 in 2013.  That was set as a government target for 2014, so it has been reached a year early.  The difference between the number that took Physics this year (35,569) and those who took it in 2006 (27,368) is certainly significant. Whether this is the so-called Brian Cox effect or something else, it’s very good news for the future health of the subject.

On the other hand, the proportion of female Physics students remains around 20%. Over the last three years the proportion has been 20.8%, 21.3% and 20.6% so numerically this year is down on last year, but the real message in these figures is that despite strenuous efforts to increase this fraction, there is no significant change.

As I write I’m formally still on Clearing business, sitting beside the telephone in case anyone needs to talk to me. However, at close of play yesterday the School of Mathematical and Physical Sciences had exceeded its recruitment target by quite a healthy margin.  We’re still open for Clearing, though, as our recent expansion means we can take a few more suitably qualified students. Physics and Astronomy did particularly well, and we’re set to welcome our biggest-ever intake into the first year in September 2013. I’m really looking forward to meeting them all.

While I’m on about statistics, here’s another thing. When I was poring over this year’s NSS results, I noticed that only 39 Physics departments appeared in the survey. When I last counted them there were 115 universities in the UK. This number doesn’t include about 50 colleges and other forms of higher education institutions which are also sometimes included in lists of universities. Anyway, my point is that at most about a third of British universities have a physics department.

Now that is a shocking statistic…

Tuesday’s Child

Posted in Bad Statistics, Cute Problems with tags on July 5, 2013 by telescoper

I came across this little teaser this morning and thought I’d share it here.

I have two children, one of whom is a son born on a Tuesday. What is the probability that I have two boys?

Please select an answer from the possibilities listed in the poll below.

This is not a new problem and you can probably find the answer on the internet very quickly, but please try to work it out yourself before doing so. In other words, try thinking before you google! I’ll add a link to a discussion of this puzzle in due course..

UPDATE: Here’s the discussion that triggered this post. As you can see from the poll, most of you got it wrong!

Can We Actually Even Tell if Humans Are Affecting the Climate? What if we did nothing at all?

Posted in Bad Statistics with tags , , on June 26, 2013 by telescoper

Reblog of a post about the doctrine of falsifiablity and its relevance to Climate Change….following on from Monday’s post.

Evidence, Absence, and the Type II Monster

Posted in Bad Statistics with tags , , , , , , on June 24, 2013 by telescoper

I was just having a quick lunchtime shufty at Dave Steele‘s blog. His latest post is inspired by the quotation “Absence of Evidence isn’t Evidence of Absence” which can apparently be traced back to Carl Sagan. I never knew that. Anyway I was muchly enjoying the piece when I suddenly stumbled into this paragraph, which quote without permission because I’m too shy to ask:

In a scientific experiment, the null hypothesis refers to a general or default position that there is no relationship between two measured phenomena. For example a well thought out point in an article by James Delingpole. Rejecting or disproving the null hypothesis is the primary task in any scientific research. If an experiment rejects the null hypothesis, it concludes that there are grounds greater than chance for believing that there is a relationship between the two (or more) phenomena being observed. Again the null hypothesis itself can never be proven. If participants treated with a medication are compared with untreated participants and there is found no statistically significant difference between the two groups, it does not prove that there really is no difference. Or if we say there is a monster in a Loch but cannot find it. The experiment could only be said to show that the results were not sufficient to reject the null hypothesis.

I’m going to pick up the trusty sword of Bayesian probability and have yet another go at the dragon of frequentism, but before doing so I’ll just correct the first sentence. The “null hypothesis” in a frequentist hypothesis test is not necessarily of the form described here: it could be of virtually any form, possibly quite different from the stated one of no correlation between two variables. All that matters is that (a) it has to be well-defined in terms of a model and (b) you have to be content to accept it as true unless and until you find evidence to the contrary. It’s true to say that there’s nowt as well-specified as nowt so nulls are often of the form “there is no correlation” or something like that, but the point is that they don’t have to be.

I note that the wikipedia page on “null hypothesis” uses the same wording as in the first sentence of the quoted paragraph, but this is not what you’ll find in most statistics textbooks. In their compendious three-volume work The Advanced Theory of Statistics Kendall & Stuart even go as far to say that the word “null” is misleading precisely because the hypothesis under test might be quite complicated, e.g. of composite nature.

Anyway, whatever the null hypothesis happens to be, the way a frequentist would proceed would be to calculate what the distribution of measurements would be if it were true. If the actual measurement is deemed to be unlikely (say that it is so high that only 1% of measurements would turn out that big under the null hypothesis) then you reject the null, in this case with a “level of significance” of 1%. If you don’t reject it then you tacitly accept it unless and until another experiment does persuade you to shift your allegiance.

But the significance level merely specifies the probability that you would reject the null-hypothesis if it were correct. This is what you would call a Type I error. It says nothing at all about the probability that the null hypothesis is actually correct. To make that sort of statement you would need to specify an alternative distribution, calculate the distribution based on it, and hence determine the statistical power of the test, i.e. the probability that you would actually reject the null hypothesis when it is correct. To fail to reject the null hypothesis when it’s actually incorrect is to make a Type II error.

If all this stuff about significance, power and Type I and Type II errors seems a bit bizarre, I think that’s because it is. So is the notion, which stems from this frequentist formulation, that all a scientist can ever hope to do is refute their null hypothesis. You’ll find this view echoed in the philosophical approach of Karl Popper and it has heavily influenced the way many scientists see the scientific method, unfortunately.

The asymmetrical way that the null and alternative hypotheses are treated in the frequentist framework is not helpful, in my opinion. Far better to adopt a Bayesian framework in which probability represents the extent to which measurements or other data support a given theory. New statistical evidence can make two hypothesis either more or less probable relative to each other. The focus is not just on rejecting a specific model, but on comparing two or more models in a mutually consistent way. The key notion is not falsifiablity, but testability. Data that fail to reject a hypothesis can properly be interpreted as supporting it, i.e. by making it more probable, but such reasoning can only be done consistently within the Bayesian framework.

What remains true, however, is that the null hypothesis (or indeed any other hypothesis) can never be proven with certainty; that is true whenever probabilistic reasoning is true. Sometimes, though, the weight of supporting evidence is so strong that inductive logic compels us to regard our theory or model or hypothesis as virtually certain. That applies whether the evidence is actual measurement or non-detections; to a Bayesian, absence of evidence can (and indeed often is) evidence of absence. The sun rises every morning and sets every evening; it is silly to argue that this provides us with no grounds for arguing that it will do so tomorrow. Likewise, the sonar surveys and other investigations in Loch Ness provide us with evidence that supports the hypothesis that there isn’t a Monster over virtually every possible hypothetical Monster that has been suggested.

It is perfectly sensible to use this reasoning to infer that there is no Loch Ness Monster. Probably.

Bunn on Bayes

Posted in Bad Statistics with tags , , , , on June 17, 2013 by telescoper

Just a quickie to advertise a nice blog post by Ted Bunn in which he takes down an article in Science by Bradley Efron, which is about frequentist statistics. I’ll leave it to you to read his piece, and the offending article, but couldn’t resist nicking his little graphic that sums up the matter for me:

Untitled-drawing1

The point is that as scientists we are interested in the probability of a model (or hypothesis)  given the evidence (or data) arising from an experiment (or observation). This requires inverse, or inductive, reasoning and it is therefore explicitly Bayesian. Frequentists focus on a different question, about the probability of the data given the model, which is not the same thing at all, and is not what scientists actually need. There are examples in which a frequentist method accidentally gives the correct (i.e. Bayesian) answer, but they are nevertheless still answering the wrong question.

I will make one further comment arising from the following excerpt from the Efron piece.

Bayes’ 1763 paper was an impeccable exercise in probability theory. The trouble and the subsequent busts came from overenthusiastic application of the theorem in the absence of genuine prior information, with Pierre-Simon Laplace as a prime violator.

I think this is completely wrong. There is always prior information, even if it is minimal, but the point is that frequentist methods always ignore it even if it is “genuine” (whatever that means). It’s not always easy to encode this information in a properly defined prior probability of course, but at least a Bayesian will not deliberately answer the wrong question in order to avoid thinking about it.

It is ironic that the pioneers of probability theory, such as Laplace, adopted a Bayesian rather than frequentist interpretation for his probabilities. Frequentism arose during the nineteenth century and held sway until recently. I recall giving a conference talk about Bayesian reasoning only to be heckled by the audience with comments about “new-fangled, trendy Bayesian methods”. Nothing could have been less apt. Probability theory pre-dates the rise of sampling theory and all the frequentist-inspired techniques that modern-day statisticians like to employ and which, in my opinion, have added nothing but confusion to the scientific analysis of statistical data.

IQ in different academic fields – Interesting? Quite!

Posted in Bad Statistics with tags , , , on May 26, 2013 by telescoper

You all know how much I detest league tables, especially those that are based on entirely arbitrary criteria but nevertheless promote a feeling of smug self-satisfaction for those who lucky enough to find themselves at the top. So when my attention was drawn to a blog post that shows (or purports to show) the variation of average IQ across different academic disciplines I decided to post the corresponding ranking with the usual health warning that IQ tests only measure a subject’s ability to do IQ tests. This isn’t even based on IQ test results per se, but on a conversion between the Graduate Record Examination (GRE) results and IQ which may be questionable. Moreover, the differences are really rather small and (as usual) no estimate of sampling uncertainty is provided.

Does this list mean that physicists are smarter than anyone else? You might say that. I couldn’t possibly comment…

  • 130.0 Physics
  • 129.0 Mathematics
  • 128.5 Computer Science
  • 128.0 Economics
  • 127.5 Chemical engineering
  • 127.0 Material science
  • 126.0 Electrical engineering
  • 125.5 Mechanical engineering
  • 125.0 Philosophy
  • 124.0 Chemistry
  • 123.0 Earth sciences
  • 122.0 Industrial engineering
  • 122.0 Civil engineering
  • 121.5 Biology
  • 120.1 English/literature
  • 120.0 Religion/theology
  • 119.8 Political science
  • 119.7 History
  • 118.0 Art history
  • 117.7 Anthropology/archeology
  • 116.5 Architecture
  • 116.0 Business
  • 115.0 Sociology
  • 114.0 Psychology
  • 114.0 Medicine
  • 112.0 Communication
  • 109.0 Education
  • 106.0 Public administration