Science’s Dirtiest Secret?

My attention was drawn yesterday to an article, in a journal I never read called American Scientist, about the role of statistics in science. Since this is a theme I’ve blogged about before I had a quick look at the piece and quickly came to the conclusion that the article was excruciating drivel. However, looking at it again today, my opinion of it has changed. I still don’t think it’s very good, but it didn’t make me as cross second time around. I don’t know whether this is because I was in a particularly bad mood yesterday, or whether the piece has been edited. But although it didn’t make me want to scream, I still think it’s a poor article.

Let me start with the opening couple of paragraphs

For better or for worse, science has long been married to mathematics. Generally it has been for the better. Especially since the days of Galileo and Newton, math has nurtured science. Rigorous mathematical methods have secured science’s fidelity to fact and conferred a timeless reliability to its findings.

During the past century, though, a mutant form of math has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos. Supposedly, the proper use of statistics makes relying on scientific results a safe bet. But in practice, widespread misuse of statistical methods makes science more like a crapshoot.

In terms of historical accuracy, the author, Tom Siegfried, gets off to a very bad start. Science didn’t get “seduced” by statistics.  As I’ve already blogged about, scientists of the calibre of Gauss and Laplace – and even Galileo – were instrumental in inventing statistics.

And what were the “modes of calculation that had served it so faithfully” anyway? Scientists have long  recognized the need to understand the behaviour of experimental errors, and to incorporate the corresponding uncertainty in their analysis. Statistics isn’t a “mutant form of math”, it’s an integral part of the scientific method. It’s a perfectly sound discipline, provided you know what you’re doing…

And that’s where, despite the sloppiness of his argument,  I do have some sympathy with some of what  Siegfried says. What has happened, in my view, is that too many people use statistical methods “off the shelf” without thinking about what they’re doing. The result is that the bad use of statistics is widespread. This is particularly true in disciplines that don’t have a well developed mathematical culture, such as some elements of biosciences and medicine, although the physical sciences have their own share of horrors too.

I’ve had a run-in myself with the authors of a paper in neurobiology who based extravagant claims on an inappropriate statistical analysis.

What is wrong is therefore not the use of statistics per se, but the fact that too few people understand – or probably even think about – what they’re trying to do (other than publish papers).

It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.

Quite, but what does this mean for “science’s dirtiest secret”? Not that it involves statistical reasoning, but that large numbers of scientists haven’t a clue what they’re doing when they do a statistical test. And if this is the case with practising scientists, how can we possibly expect the general public to make sense of what is being said by the experts? No wonder people distrust scientists when so many results confidently announced on the basis of totally spurious arguments, turn out to be be wrong.

The problem is that the “standard” statistical methods shouldn’t be “standard”. It’s true that there are many methods that work in a wide range of situations, but simply assuming they will work in any particular one without thinking about it very carefully is a very dangerous strategy. Siegfried discusses examples where the use of “p-values” leads to incorrect results. It doesn’t surprise me that such examples can be found, as the misinterpretation of p-values is rife even in numerate disciplines, and matters get worse for those practitioners who combine p-values from different studies using meta-analysis, a method which has no mathematical motivation whatsoever and which should be banned. So indeed should a whole host of other frequentist methods which offer limitless opportunities for to make a complete botch of the data arising from a research project.

Siegfried goes on

Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical.

Any single scientific study done along is quite likely to be incorrect. Really? Well, yes, if it is done incorrectly. But the point is not that they are incorrect because they use statistics, but that they are incorrect because they are done incorrectly. Many scientists don’t even understand the statistics well enough to realise that what they’re doing is wrong.

If I had my way, scientific publications – especially in disciplines that impact directly on everyday life, such as medicine – should adopt a much more rigorous policy on statistical analysis and on the way statistical significance is reported. I favour the setting up of independent panels whose responsibility is to do the statistical data analysis on behalf of those scientists who can’t be trusted to do it correctly themselves.

Having started badly, and lost its way in the middle, the article ends disappointingly too. Having led us through a wilderness of failed frequentists analyses, he finally arrives at a discussion of the superior Bayesian methodology, in irritatingly half-hearted fashion.

But Bayesian methods introduce a confusion into the actual meaning of the mathematical concept of “probability” in the real world. Standard or “frequentist” statistics treat probabilities as objective realities; Bayesians treat probabilities as “degrees of belief” based in part on a personal assessment or subjective decision about what to include in the calculation. That’s a tough placebo to swallow for scientists wedded to the “objective” ideal of standard statistics….

Conflict between frequentists and Bayesians has been ongoing for two centuries. So science’s marriage to mathematics seems to entail some irreconcilable differences. Whether the future holds a fruitful reconciliation or an ugly separation may depend on forging a shared understanding of probability.

The difficulty with this piece as a whole is that it reads as an anti-science polemic: “Some science results are based on bad statistics, therefore statistics is bad and science that uses statistics is bogus.” I don’t know whether that’s what the author intended, or whether it was just badly written.

I’d say the true state of affairs is different. A lot of bad science is published, and a lot of that science is bad because it uses statistical reasoning badly. You wouldn’t however argue that a screwdriver is no use because some idiot tries to hammer a nail in with one.

Only a bad craftsman blames his tools.

13 Responses to “Science’s Dirtiest Secret?”

  1. I must admit that I get a little depressed about the lack of knowledge on how to handle data and statistical inference amongst undergraduates and graduates. We make grand statements about the scientific method being the comparison between theory and observations, but never really tell them what “comparison” means. It really needs to be threaded through the undergraduate program, but is seriously lacking (we have a Bayesian course for our 4th years, but too little to late).

    I was actually thinking of writing a booklet about it to give to our undergrads.

  2. I find it strange that the author considers the Beysian notion of a prior “tough placebo to swallow for scientists wedded to the “objective” ideal of standard statistics”. The typical sentiment amongst the people I work with, is that if your conclusion depends so strongly on the prior as to make you uncomfortable, that’s just a sign you need better data.

  3. Anton Garrett Says:

    He’s almost right when he says that “widespread misuse of statistical methods makes science more like a crapshoot.” Had he said “widespread use of ad hoc statistical methods makes some science more like a crapshoot” he would have been spot-on. Which raises the question of right and wrong statistical methods. Peter and I are Bayesians not frequentists, but even of the word ‘Bayesian’ there are too many shades of meaning. (Some people even refer to ‘Bayesian methods’ for constructing an estimator in frequentist methodology!)

    If you consider a number to represent how strongly one binary proposition (A) is implied to be true upon assuming another (B) is true, according to known relations between the referents of A and B, then the Boolean algebra of propositions induces an algebra for the numbers. That algebra turns out to be the sum and product rules; they were derived like this by an unsung hero, RT Cox, in 1946. This concept is what you actually want in any problem where you are dealing with uncertainty, and since it obeys “the laws of probability” you might as well call it probability – but if anybody objects, don’t waste time arguing over definitions, just say that you will calculate this number for the proposition of interest and solve the problem. (NB No mention of ‘belief’.) In quantitative problems you can define densities via propositions such as “the parameter is in the range x and x+dx”.

    Bayes’ theorem – a trivial consequence of the sum and product rules – tells how to revise the degree of implication when fresh data, having a bearing on the problem, come in. But where do you begin before you have any data; what is the “prior density”? The Bayesian view has been criticised as arbitrary for supposedly not addressing this issue, but it is not. If, for example, you have no data telling you whereabouts on a circular wire a bead is located, you must by symmetry assign equal likeliness to any location – uniform prior. Symmetry problems can help in more complex situations, although not all are solved; this is a research front today.

    Moreover it is a strength of Bayesian methods, not a weakness, that they take prior information into account. Suppose that you have prior info corresponding to certainty that a parameter takes a particular value. Then your prior density is a delta function at that value. Bayes’ theorem, which gets the posterior density by multiplying the prior by the likelihood and renormalising, tells you that the posterior density is also a delta function at that value – just as intuition demands. All deviations from that value, in the data, are ascribed to noise. Frequentist methods, in contrast, put plenty of probability where you know with certainty that it cannot be! These methods were dreamed up by bright people for use with datasets of particular types, for which their answer is close to the Bayesian, but ultimately they are ad hoc and wrong.

    The Bayesian way outlined here is applicable to one-off events (what is the probability density for the mass of the universe?) and equally to repeated trials. It is therefore more general than frequentism, too. Frequentists would do well to ponder what the word “random” really means. We might be better off without it.

    Anton

  4. Thomas D Says:

    I remember Tom Siegfried vaguely from my time at Michigan. He does make a decent stab at reporting science, rather than just rewriting press releases about goofy ‘discoveries’.

    However, journalists don’t have a very good record of interpreting probabilities either…

    Bayesian methods can be abused just like any other. There is the good Bayesian who doesn’t raise his personal tastes in theory to the status of prior information.. and the bad Bayesian who quotes results in the abstract of a paper that turn out, when you read carefully, to depend strongly on a personal choice of prior.

    Some Bayes fans like to say that their results do indeed accurately represent ‘our’ beliefs (those of the people deriving them?): a problem may arise when people with slightly different prior beliefs try to read the paper, or use the results. It can turn into a reductio ad absurdum which actually justifies judging a paper by its author list.

  5. > a problem may arise when people with slightly different prior beliefs try to read the paper, or use the results.

    If you have a different prior, you will get a different result, unless overwhelmed by the data. If two different priors give significantly different results, get more data. It isn’t that hard.

  6. I think statistical reasoning is the the essence of good scientific thought. Galileo et al introduced the idea that we need to read the book of Nature critically; the work of Gauss and Laplace gave us the tools to do that rigorously. It is precisely about answering the question “am I fooling myself ?”

  7. Thomas D Says:

    Thanks for your insulting condescension ‘Cusp’. If the prior makes no difference to the result, why bother with it in the first place?

    Two different priors may well give significantly different results: for example if one is well justifiable by physical arguments and the other less so.

    In gravitational wave detection we are often faced with the question: what are the relative probabilities that an apparent signal of given strength is due to an astrophysical signal, or to a detector artefact? Since the strength of a GW signal is inversely proportional to the distance, it makes some sense to take an inverse cube prior on the signal strength. This corresponds to a uniform distribution of sources over volume. However, if we know something about the nearby galaxy distribution, which is not uniform over volume, we can use that as a prior. But that isn’t perfect either since one type of galaxy may be more likely than another to contain sources…

    If you are hoping to claim a discovery, which almost by definition means the observational data are not yet overwhelmingly informative, then the difference between one prior and another may be crucial.

  8. If I had my way, scientific publications – especially in disciplines that impact directly on everyday life, such as medicine – should adopt a much more rigorous policy on statistical analysis and on the way statistical significance is reported. I favour the setting up of independent panels whose responsibility is to do the statistical data analysis on behalf of those scientists who can’t be trusted to do it correctly themselves

    Actually, in the medical sciences it’s fairly common practice to consult with a professional statistician on data analysis and interpretation for research studies. Maybe not “common practice” enough given that things still go wrong – although I’d argue that the media and lawyers have something to do with that too.

    But in my experience, medical scientists are actually far more aware of their lack of expertise in statistics than, say, physicists or astronomers – who do it all themselves, and often badly.

  9. I would be very interested to hear your opinion on why there is no mathematical motivation for Meta analysis..

    Will appreciate to read a detailed entry from you on this topic and generally on some of the statistical methods used in Biological sciences that you think contradict with the Scientific method.

    Thanks,

  10. Thomas – I was not intending to sound insulting, but I stand by my comments. Priors are important, and should be used; if generally uninformative priors are used, then small amounts of data will potentially results in different conclusions, given the same data. More data with the same general priors will result in the same conclusions.

    More specific priors will allow judgment of which is a better description of the data (by looking at the evidence). Not enough data will generally allow such a choice, whereas more data can effectively rule one out.

    I guess I am tired of the whooooo-spoooooky view of priors.

  11. Anton Garrett Says:

    @H:

    Meta-analysis runs something like this: “10 experiments find an effect; 6 find no effect; so, regardless of the fact that some experiments used string and sealing wax and others used state-of-the-art cryogenic apparatus, we shall assign them all equal weight and say the probability that the effect is real is 10/16”.

    This is, obvously, a parody – but the problem I am highlighting does not go away in real meta-analysis; it simply gets brushed under the carpet of the uncertainties involved in any probabilistic calculation. In particular, different experiments that address the same issue of (say) a parameter value might marginalise out totally different sets of intermediate experimental parameters. The only correct way to deal with that is to have a joint pdf for both sets and the parameter of interest, combine the data, then marginalise out appropriately. I know of no meta-analytical technique which can do that.

    Anton

  12. telescoper Says:

    Cusp,

    I think there are too many people who say they’re Bayesian but are scared of priors. I like priors. The only thing that bothers me is when people blindly make them uniform when that makes no sense at all. It is true that eventually, if you have enough data, everything becomes dominated by the likelihood. But mostly we’re not in that Nirvana. What we should do is reason consistently with what we have, and put our assumptions on the table.

    Peter

  13. I problem I often see in reports about scientific research is that the research indicates that A is caused by B, when it is obvious that there are several other possible causes other than A that could cause the correlation. This is not, of course, a fault of the mathematics, but of the people using it or reporting it.

Leave a reply to Cusp Cancel reply