Archive for the Bad Statistics Category

Oh what a tangled web we weave…

Posted in Bad Statistics with tags , , , , , , on March 11, 2013 by telescoper

..when first we practice frequentist statistics!

I couldn’t resist a quick post directing you to a short paper on the arXiv with the following abstract:

I use archival data to measure the mass of the central black hole in NGC 4526, M = (4.70 +- 0.14) X 10^8 Msun. This 3% error bar is the most precise for an extra-galactic black hole and is close to the precision obtained for Sgr A* in the Milky Way. The factor 7 improvement over the previous measurement is entirely due to correction of a mathematical error, an error that I suggest may be common among astronomers.

The “mathematical error” quoted in the abstract involves using chi-squared-per-degree-of-freedom instead of chi-squared instead of the full likelihood function instead of the proper, Bayesian, posterior probability. The best way to avoid such confusion is to do things properly in the first place. That way you can also fold in errors on the distance to the black hole, etc etc…

Bayes in the dock (again)

Posted in Bad Statistics with tags , , , , , on February 28, 2013 by telescoper

This morning on Twitter there appeared a link to a blog post reporting that the Court of Appeal had rejected the use of Bayesian probability in legal cases. I recommend anyone interested in probability to read it, as it gives a fascinating insight into how poorly the concept is understood.

Although this is a new report about a new case, it’s actually not an entirely new conclusion. I blogged about a similar case a couple of years ago, in fact. The earlier story n concerned an erroneous argument given during a trial about the significance of a match found between a footprint found at a crime scene and footwear belonging to a suspect.  The judge took exception to the fact that the figures being used were not known sufficiently accurately to make a reliable assessment, and thus decided that Bayes’ theorem shouldn’t be used in court unless the data involved in its application were “firm”.

If you read the Guardian article to which I’ve provided a link you will see that there’s a lot of reaction from the legal establishment and statisticians about this, focussing on the forensic use of probabilistic reasoning. This all reminds me of the tragedy of the Sally Clark case and what a disgrace it is that nothing has been done since then to improve the misrepresentation of statistical arguments in trials. Some of my Bayesian colleagues have expressed dismay at the judge’s opinion.

My reaction to this affair is more muted than you would probably expect. First thing to say is that this is really not an issue relating to the Bayesian versus frequentist debate at all. It’s about a straightforward application of Bayes’ theorem which, as its name suggests, is a theorem; actually it’s just a straightforward consequence of the sum and product laws of the calculus of probabilities. No-one, not even the most die-hard frequentist, would argue that Bayes’ theorem is false. What happened in this case is that an “expert” applied Bayes’ theorem to unreliable data and by so doing obtained misleading results. The  issue is not Bayes’ theorem per se, but the application of it to inaccurate data. Garbage in, garbage out. There’s no place for garbage in the courtroom, so in my opinion the judge was quite right to throw this particular argument out.

But while I’m on the subject of using Bayesian logic in the courts, let me add a few wider comments. First, I think that Bayesian reasoning provides a rigorous mathematical foundation for the process of assessing quantitatively the extent to which evidence supports a given theory or interpretation. As such it describes accurately how scientific investigations proceed by updating probabilities in the light of new data. It also describes how a criminal investigation works too.

What Bayesian inference is not good at is achieving closure in the form of a definite verdict. There are two sides to this. One is that the maxim “innocent until proven guilty” cannot be incorporated in Bayesian reasoning. If one assigns a zero prior probability of guilt then no amount of evidence will be able to change this into a non-zero posterior probability; the required burden is infinite. On the other hand, there is the problem that the jury must decide guilt in a criminal trial “beyond reasonable doubt”. But how much doubt is reasonable, exactly? And will a jury understand a probabilistic argument anyway?

In pure science we never really need to achieve this kind of closure, collapsing the broad range of probability into a simple “true” or “false”, because this is a process of continual investigation. It’s a reasonable inference, for example, based on Supernovae and other observations that the Universe is accelerating. But is it proven that this is so? I’d say “no”,  and don’t think my doubts are at all unreasonable…

So what I’d say is that while statistical arguments are extremely important for investigating crimes – narrowing down the field of suspects, assessing the reliability of evidence, establishing lines of inquiry, and so on – I don’t think they should ever play a central role once the case has been brought to court unless there’s much clearer guidance given to juries and stricter monitoring of so-called “expert” witnesses.

I’m sure various readers will wish to express diverse opinions on this case so, as usual, please feel free to contribute through the box below!

REF moves the goalposts (again)

Posted in Bad Statistics, Education, Science Politics with tags , , , on January 18, 2013 by telescoper

The topic of the dreaded 2014 Research Excellence Framework came up quite a few times in quite a few different contexts over the last few days, which reminded me that I should comment on a news item that appeared a week or so ago.

As you may or may not be aware, the REF is meant to assess the excellence of university departments in various disciplines and distribute its “QR” research funding accordingly.  Institutions complete submissions which include details of relevant publications etc and then a panel sits in judgement. I’ve already blogged of all this: the panels clearly won’t have time to read every paper submitted in any detail at all, so the outcome is likely to be highly subjective. Moreover, HEFCE’s insane policy to award the bulk of its research funds to only the very highest grade (4* – “internationally excellent”) means that small variations in judged quality will turn into enormous discrepancies in the level of research funding. The whole thing is madness, but there seems no way to inject sanity into the process as the deadline for submissions remorselessly approaches.

Now another wrinkle has appeared on the already furrowed brows of those preparing REF submissions. The system allows departments to select staff to be entered; it’s not necessary for everyone to go in. Indeed if only the very best researchers are entered then the typical score for the department will be high, so it will appear  higher up  in the league tables, and since the cash goes primarily to the top dogs then this might produce almost as much money as including a few less highly rated researchers.

On the other hand, this is a slightly dangerous strategy because it presupposes that one can predict which researchers and what research will be awarded the highest grade. A department will come a cropper if all its high fliers are deemed by the REF panels to be turkeys.

In Wales there’s something that makes this whole system even more absurd, which is that it’s almost certain that there will be no QR funding at all. Welsh universities are spending millions preparing for the REF despite the fact that they’ll get no money even if they do stunningly well. The incentive in Wales is therefore even stronger than it is in England to submit only the high-fliers, as it’s only the position in the league tables that will count.

The problem with a department adopting the strategy of being very selective is that it could have a very  negative effect on the career development of younger researchers if they are not included in their departments REF submission. As well as taking the risk that people who manage to convince their Head of School that they are bound to get four stars in the REF may not have the same success with the various grey eminences who make the decision that really matters.

Previous incarnations of the REF (namely the Research Assessment Exercises of 2008 and 2001) did not publish explicit information about exactly how many eligible staff were omitted from the submissions, largely because departments were extremely creative in finding ways of hiding staff they didn’t want to include.

Now however it appears there are plans that the Higher Education Statistics Agency (HESA) will publish its own figures on how many staff it thinks are eligible for inclusion in each department. I’m not sure how accurate these figures will be but they will change the game, in that they will allow compilers of league tables to draw up lists of the departments that prefer playing games to   just allowing the REF panels to  judge the quality of their research.

I wonder how many universities are hastily revising their submission plans in the light of this new twist?

Society Counts, and so do Astronomers!

Posted in Bad Statistics, Science Politics with tags , , , , , on December 6, 2012 by telescoper

The other day I received an email from the British Academy (for Humanities and Social Sciences) announcing a new position statement on what they call Quantitative Skills.  The complete text of this statement, which is entitled Society Counts and which is well worth reading,  is now  available on the British Academy website.

Here’s an excerpt from the letter accompanying the document:

The UK has a serious deficit in quantitative skills in the social sciences and humanities, according to a statement issued today (18 October 2012) by the British Academy. This deficit threatens the overall competitiveness of the UK’s economy, the effectiveness of public policy-making, and the UK’s status as a world leader in research and higher education.

The statement, Society Counts, raises particular concerns about the impact of this skills deficit on the employability of young people. It also points to serious consequences for society generally. Quantitative skills enable people to understand what is happening to poverty, crime, the global recession, or simply when making decisions about personal investment or pensions.

Citing a recent survey of MPs by the Royal Statistical Society’s getstats campaign – in which only 17% of Conservative and 30% of Labour MPs thought politicians use official statistics and figures accurately when talking about their policies – Professor Sir Adam Roberts, President of the British Academy, said: “Complex statistical and analytical work on large and complex data now underpins much of the UK’s research, political and business worlds. Without the right skills to analyse this data properly, government professionals, politicians, businesses and most of all the public are vulnerable to misinterpretation and wrong decision-making.”

The statement clearly identifies a major problem, not just in the Humanities and Social Sciences but throughout academia and wider society. I even think the British Academy might be a little harsh on its own constituency because, with a few notable exceptions,  statistics and other quantitative data analysis methods are taught very poorly to science students too.  Just the other day I was talking to an undergraduate student who is thinking about doing a PhD in physics about what that’s likely to entail. I told him that the one thing he could be pretty sure he’d have to cope with is analysing data statistically. Like most physics departments, however, we don’t run any modules on statistical techniques and only the bare minimum is involved in the laboratory session. Why? I think it’s because there are too few staff who would be able to teach such material competently (because they don’t really understand it themselves).

Here’s a paragraph from the British Association statement:

There is also a dearth of academic staff able to teach quantitative methods in ways that are relevant and exciting to students in the social sciences and humanities. As few as one in ten university social science lecturers have the skills necessary to teach a basic quantitative methods course, according to the report. Insufficient curriculum time is devoted to methodology in many degree programmes.

Change “social sciences and humanities” to “physics” and I think that statement would still be correct. In fact I think “one in ten” would be an overestimate.

The point is that although  physics is an example of a quantitative discipline, that doesn’t mean that the training in undergraduate programmes is adequate for the task. The upshot is that there is actually a great deal of dodgy statistical analysis going on across a huge number of disciplines.

So what is to be done? I think the British Academy identifies only part of the required solution. Of course better training in basic numeracy at school level is needed, but it shouldn’t stop there. I think there also needs to a wider exchange of knowledge and ideas across disciplines and a greater involvement of expert consultants. I think this is more likely to succeed than getting more social scientists to run standard statistical analysis packages. In my experience, most bogus statistical analyses do not result from using the method wrong, but from using the wrong method…

A great deal of astronomical research is based on inferences drawn from large and often complex data sets, so astronomy is a discipline with a fairly enlightened attitude to statistical data analysis. Indeed, many important contributions to the development of statistics were made by astronomers. In the future I think we’ll  see many more of the astronomers working on big data engage with the wider academic community by developing collaborations or acting as consultants in various ways.

We astronomers are always being challenged to find applications of their work outside the purely academic sphere, and this is one that could be developed much further than it has so far. It disappoints me that we always seem to think of this exclusively in terms of technological spin-offs, while the importance of transferable expertise is often neglected. Whether you’re a social scientist or a physicist, if you’ve got problems analysing your data, why not ask an astronomer?

If physicists analysed election results…

Posted in Bad Statistics, Politics, The Universe and Stuff on November 7, 2012 by telescoper

I think this is a wonderfully sharp satirical take on pollsters, physicists and statistics…

freakofnature's avatarFreak of Nature

Mainstream media outlets around the world have declared Barack Obama the victor in yesterday’s US presidential elections, but particle physicists at CERN say that the race is still too close to call.

With every state except Florida reporting, the New York Timesannounced that Obama had won the popular vote and easily gained the electoral college points needed to win re-election. The Princeton Election Consortium put the probability of Obama’s victory at 99.2%.

But that confidence level is still several standard deviations away from the point at which particle physicists would be willing to declare the next president. According to the norms of the field, pollsters would have to be 99.99995% confident that Obama had won before physicists would be willing to call the race.

“All we can say right now is there is some evidence that Barack Obama will return to the White House in January,” says Marcus Georgio…

View original post 237 more words

The Tremors from L’Aquila

Posted in Bad Statistics, Open Access, Science Politics with tags , , , on October 23, 2012 by telescoper

I can’t resist a comment on news which broke yesterday that an Italian court has found six scientists and a former government official guilty of manslaughter in connection with the L’Aquila Earthquake of 2009. Scientific colleagues of mine are shocked by their conviction and by the severity of the sentences (six years’ imprisonment), the assumption being that they were convicted for having failed to predict the earthquake. However, as Nature News pointed out long before the trial when the scientists were indicted:

The view from L’Aquila, however, is quite different. Prosecutors and the families of victims alike say that the trial has nothing to do with the ability to predict earthquakes, and everything to do with the failure of government-appointed scientists serving on an advisory panel to adequately evaluate, and then communicate, the potential risk to the local population. The charges, detailed in a 224-page document filed by Picuti, allege that members of the National Commission for Forecasting and Predicting Great Risks, who held a special meeting in L’Aquila the week before the earthquake, provided “incomplete, imprecise, and contradictory information” to a public that had been unnerved by months of persistent, low-level tremors. Picuti says that the commission was more interested in pacifying the local population than in giving clear advice about earthquake preparedness.

“I’m not crazy,” Picuti says. “I know they can’t predict earthquakes. The basis of the charges is not that they didn’t predict the earthquake. As functionaries of the state, they had certain duties imposed by law: to evaluate and characterize the risks that were present in L’Aquila.” Part of that risk assessment, he says, should have included the density of the urban population and the known fragility of many ancient buildings in the city centre. “They were obligated to evaluate the degree of risk given all these factors,” he says, “and they did not.”

Many of my colleagues have interpreted the conviction of these scientists as an attack on science, but the above statement actually looks to me more like a demand that the scientists involved should have been more scientific. By that I mean not giving a simple “yes” or “no” answer (which in this case was “no”) but by give a proper scientific analysis of the probabilities involved. This comment goes straight to two issues that I feel very strongly about. One is the vital importance of probabilistic reasoning – in this case in connection with a risk assessment – and the other is the need for openness in science.

I thought I’d take this opportunity to repeat the reasons I think statistics and statistical reasoning are so important. Of course they are important in science. In fact, I think they lie at the very core of the scientific method, although I am still surprised how few practising scientists are comfortable even with statistical language. A more important problem is the popular impression that science is about facts and absolute truths. It isn’t. It’s a process. In order to advance, it has to question itself.

Statistical reasoning also applies outside science to many facets of everyday life, including business, commerce, transport, the media, and politics. It is a feature of everyday life that science and technology are deeply embedded in every aspect of what we do each day. Science has given us greater levels of comfort, better health care, and a plethora of labour-saving devices. It has also given us unprecedented ability to destroy the environment and each other, whether through accident or design. Probability even plays a role in personal relationships, though mostly at a subconscious level.

Civilized societies face severe challenges in this century. We must confront the threat of climate change and forthcoming energy crises. We must find better ways of resolving conflicts peacefully lest nuclear or conventional weapons lead us to global catastrophe. We must stop large-scale pollution or systematic destruction of the biosphere that nurtures us. And we must do all of these things without abandoning the many positive things that science has brought us. Abandoning science and rationality by retreating into religious or political fundamentalism would be a catastrophe for humanity.

Unfortunately, recent decades have seen a wholesale breakdown of trust between scientists and the public at large; the conviction of the scientists in the L’Aquila case is just one example. This breakdown is due partly to the deliberate abuse of science for immoral purposes, and partly to the sheer carelessness with which various agencies have exploited scientific discoveries without proper evaluation of the risks involved. The abuse of statistical arguments have undoubtedly contributed to the suspicion with which many individuals view science.

There is an increasing alienation between scientists and the general public. Many fewer students enrol for courses in physics and chemistry than a a few decades ago. Fewer graduates mean fewer qualified science teachers in schools. This is a vicious cycle that threatens our future. It must be broken.

The danger is that the decreasing level of understanding of science in society means that knowledge (as well as its consequent power) becomes concentrated in the minds of a few individuals. This could have dire consequences for the future of our democracy. Even as things stand now, very few Members of Parliament are scientifically literate. How can we expect to control the application of science when the necessary understanding rests with an unelected “priesthood” that is hardly understood by, or represented in, our democratic institutions?

Very few journalists or television producers know enough about science to report sensibly on the latest discoveries or controversies. As a result, important matters that the public needs to know about do not appear at all in the media, or if they do it is in such a garbled fashion that they do more harm than good.

Years ago I used to listen to radio interviews with scientists on the Today programme on BBC Radio 4. I even did such an interview once. It is a deeply frustrating experience. The scientist usually starts by explaining what the discovery is about in the way a scientist should, with careful statements of what is assumed, how the data is interpreted, and what other possible interpretations might be and the likely sources of error. The interviewer then loses patience and asks for a yes or no answer. The scientist tries to continue, but is badgered. Either the interview ends as a row, or the scientist ends up stating a grossly oversimplified version of the story.

Some scientists offer the oversimplified version at the outset, of course, and these are the ones that contribute to the image of scientists as priests. Such individuals often believe in their theories in exactly the same way that some people believe religiously. Not with the conditional and possibly temporary belief that characterizes the scientific method, but with the unquestioning fervour of an unthinking zealot. This approach may pay off for the individual in the short term, in popular esteem and media recognition – but when it goes wrong it is science as a whole that suffers. When a result that has been proclaimed certain is later shown to be false, the result is widespread disillusionment. And the more secretive the behaviour of the scientific community, the less reason the public has to trust its pronouncements.

I don’t have any easy answers to the question of how to cure this malaise, but do have a few suggestions. It would be easy for a scientist such as myself to blame everything on the media and the education system, but in fact I think the responsibility lies mainly with ourselves. We are usually so obsessed with our own research, and the need to publish specialist papers by the lorry-load in order to advance our own careers that we usually spend very little time explaining what we do to the public or why we do it.

I think every working scientist in the country should be required to spend at least 10% of their time working in schools or with the general media on “outreach”, including writing blogs like this. People in my field – astronomers and cosmologists – do this quite a lot, but these are areas where the public has some empathy with what we do. If only biologists, chemists, nuclear physicists and the rest were viewed in such a friendly light. Doing this sort of thing is not easy, especially when it comes to saying something on the radio that the interviewer does not want to hear. Media training for scientists has been a welcome recent innovation for some branches of science, but most of my colleagues have never had any help at all in this direction.

The second thing that must be done is to improve the dire state of science education in schools. Over the last two decades the national curriculum for British schools has been dumbed down to the point of absurdity. Pupils that leave school at 18 having taken “Advanced Level” physics do so with no useful knowledge of physics at all, even if they have obtained the highest grade. I do not at all blame the students for this; they can only do what they are asked to do. It’s all the fault of the educationalists, who have done the best they can for a long time to convince our young people that science is too hard for them. Science can be difficult, of course, and not everyone will be able to make a career out of it. But that doesn’t mean that it should not be taught properly to those that can take it in. If some students find it is not for them, then so be it. I always wanted to be a musician, but never had the talent for it.

The third thing that has to be done is for scientists to be far more open. Publicly-funded scientists have a duty not only to publish their conclusions in such a way that the public can access them freely, but also to publish their data, their methodology and the intermediate steps. Most members of the public will struggle to make sense of the information, but at least there will be able to see that nothing is being deliberately concealed.

Everyone knows that earthquake prediction is practically impossible to do accurately. The danger of the judgement in the L’Aquila Earthquake trial (apart from discouraging scientists from ever becoming seismologists) is that the alarm will be sounded every time there is the smallest tremor. The potential for panic is enormous. But the science in this field,as in any other, does not actually tell one how to act on evidence of risk, merely to assess it. It’s up to others to decide whether and when to act, when the threshold of danger has been crossed. There is no scientific answer to the question “how risky is too risky?”.

So instead of bland reassurances or needless panic-mongering, the scientific community should refrain from public statements about what will happen and what won’t and instead busy itself with the collection, analysis and interpretation of data and publish its studies as openly as possible. The public will find it very difficult to handle this information overload, but so they should. Difficult questions don’t have simple answers. Scientists aren’t priests.

Value Added?

Posted in Bad Statistics, Education with tags , , , , , on October 22, 2012 by telescoper

Busy busy busy. Only a few minutes for a lunchtime post today. I’ve a feeling I’m going to be writing that rather a lot over the next few weeks. Anyway, I thought I’d use the opportunity to enlist the help of the blogosphere to try to solve a problem for me.

Yesterday I drew attention to the Guardian University league tables for Physics (purely for the purposes of pointing out that excellent departments exist outside the Russell Group). One thing I’ve never understood about these legal tables is the column marked “value added”. Here is the (brief) explanation offered:

The value-added score compares students’ individual degree results with their entry qualifications, to show how effective the teaching is. It is given as a rating out of 10.

If you look at the scores you will find the top department, Oxford, has a score of 6 for “value added”;  in deference to my alma matter, I’ll note that Cambridge doesn’t appear in these tables.  Sussex scores 9 on value-added, while  Cardiff only scores 2. What seems peculiar is that the “typical UCAS scores” for students in these departments are 621, 409 and 420 respectively. To convert these into A-level scores, see here. These should represent the typical entry qualifications of students at the respective institutions.

The point is that Oxford only takes students with very high A-level grades, yet still manages to score a creditable 6/10 on “value added”.  Sussex and Cardiff have very similar scores for entry tariff, significantly lower than Oxford, but differ enormously in “value added” (9 versus 2).

The only interpretation of the latter two points that makes sense to me would be if Sussex turned out many more first-class degrees given its entry qualifications than Cardiff (since their tariff levels are similar, 409 versus 420). But this doesn’t seem to be the case;  the fraction of first-class degrees awarded by Cardiff Physics & Astronomy is broadly in line with the rest of the sector and certainly doesn’t differ by a factor of several compared to Sussex!

These aren’t the only anomalous cases. Elsewhere in the table you can find Exeter and Leeds, which have identical UCAS tariffs (435) but value added scores that differ by a wide margin (9 versus 4, respectively).

And if Oxford only accepts students with the highest A-level scores, how can it score higher on “value added” than a department like Cardiff which takes in many students with lower A-levels and turns at least some of them into first-class graduates? Shouldn’t the Oxford “value added” score be very low indeed, if any Oxford students at all fail to get first class degrees?

I think there’s a rabbit off. Can anyone explain the paradox to me?

Answers on a postcard please. Or, better, through the comments box.

Rolling Boulders…

Posted in Bad Statistics, The Universe and Stuff with tags , , , , , , , , on October 13, 2012 by telescoper

I’m a bit slow to get started this morning, since I didn’t get home until the wee small hours after a trip to the Royal Astronomical Society yesterday, followed by a pleasantly tipsy dinner at the Athenaeum with the RAS Club. Anyhow, one of the highlights of the meeting was a presentation by Prof. Gerald Roberts from Birkbeck on Marsquakes: evidence from rolled boulder populations, Cerberus Fossae, Mars.  The talk was based on a recent paper of his (unfortunately behind a paywall), which is about trying to reconstruct the origin and behaviour of “Marsquakes” using evidence from the trails made by rolling boulders, dislodged by seismic activity or vulcanism.  Here is a sample picture showing the kind of trails he’s using – the resolution is such that one pixel is only 20cm!

There are enough trails to allow a statistical analysis of their distribution in space and in terms of size (which can be inferred from the width of the trail). I had some questions about the analysis, but I haven’t been able to read the paper in detail yet so I won’t comment on that until I’ve done so, but the thing I remember most from the talk were these remarkable pictures of what a rolling boulder can do on Earth. They were taken after the earthquake in Christchurch, New Zealand, in 2011.

A large boulder was dislodged from the top of the hill behind the house in the second picture. It didn’t just roll, but bounced down the slope (see the large furrow in the first picture; similar bouncing trajectories can be seen in the picture from Mars), smashed straight through the house, exited the other side and came to rest on a road. Yikes.

The Return of the Inductive Detective

Posted in Bad Statistics, Literature, The Universe and Stuff with tags , , , , , , , , on August 23, 2012 by telescoper

A few days ago an article appeared on the BBC website that discussed the enduring appeal of Sherlock Holmes and related this to the processes involved in solving puzzles. That piece makes a number of points I’ve made before, so I thought I’d update and recycle my previous post on that theme. The main reason for doing so is that it gives me yet another chance to pay homage to the brilliant Jeremy Brett who, in my opinion, is unsurpassed in the role of Sherlock Holmes. It also allows me to return to a philosophical theme I visited earlier this week.

One of the  things that fascinates me about detective stories (of which I am an avid reader) is how often they use the word “deduction” to describe the logical methods involved in solving a crime. As a matter of fact, what Holmes generally uses is not really deduction at all, but inference (a process which is predominantly inductive).

In deductive reasoning, one tries to tease out the logical consequences of a premise; the resulting conclusions are, generally speaking, more specific than the premise. “If these are the general rules, what are the consequences for this particular situation?” is the kind of question one can answer using deduction.

The kind of reasoning of reasoning Holmes employs, however, is essentially opposite to this. The  question being answered is of the form: “From a particular set of observations, what can we infer about the more general circumstances that relating to them?”.

And for a dramatic illustration of the process of inference, you can see it acted out by the great Jeremy Brett in the first four minutes or so of this clip from the classic Granada TV adaptation of The Hound of the Baskervilles:

I think it’s pretty clear in this case that what’s going on here is a process of inference (i.e. inductive rather than deductive reasoning). It’s also pretty clear, at least to me, that Jeremy Brett’s acting in that scene is utterly superb.

I’m probably labouring the distinction between induction and deduction, but the main purpose doing so is that a great deal of science is fundamentally inferential and, as a consequence, it entails dealing with inferences (or guesses or conjectures) that are inherently uncertain as to their application to real facts. Dealing with these uncertain aspects requires a more general kind of logic than the  simple Boolean form employed in deductive reasoning. This side of the scientific method is sadly neglected in most approaches to science education.

In physics, the attitude is usually to establish the rules (“the laws of physics”) as axioms (though perhaps giving some experimental justification). Students are then taught to solve problems which generally involve working out particular consequences of these laws. This is all deductive. I’ve got nothing against this as it is what a great deal of theoretical research in physics is actually like, it forms an essential part of the training of an physicist.

However, one of the aims of physics – especially fundamental physics – is to try to establish what the laws of nature actually are from observations of particular outcomes. It would be simplistic to say that this was entirely inductive in character. Sometimes deduction plays an important role in scientific discoveries. For example,  Albert Einstein deduced his Special Theory of Relativity from a postulate that the speed of light was constant for all observers in uniform relative motion. However, the motivation for this entire chain of reasoning arose from previous studies of eletromagnetism which involved a complicated interplay between experiment and theory that eventually led to Maxwell’s equations. Deduction and induction are both involved at some level in a kind of dialectical relationship.

The synthesis of the two approaches requires an evaluation of the evidence the data provides concerning the different theories. This evidence is rarely conclusive, so  a wider range of logical possibilities than “true” or “false” needs to be accommodated. Fortunately, there is a quantitative and logically rigorous way of doing this. It is called Bayesian probability. In this way of reasoning,  the probability (a number between 0 and 1 attached to a hypothesis, model, or anything that can be described as a logical proposition of some sort) represents the extent to which a given set of data supports the given hypothesis.  The calculus of probabilities only reduces to Boolean algebra when the probabilities of all hypothesese involved are either unity (certainly true) or zero (certainly false). In between “true” and “false” there are varying degrees of “uncertain” represented by a number between 0 and 1, i.e. the probability.

Overlooking the importance of inductive reasoning has led to numerous pathological developments that have hindered the growth of science. One example is the widespread and remarkably naive devotion that many scientists have towards the philosophy of the anti-inductivist Karl Popper; his doctrine of falsifiability has led to an unhealthy neglect of  an essential fact of probabilistic reasoning, namely that data can make theories more probable. More generally, the rise of the empiricist philosophical tradition that stems from David Hume (another anti-inductivist) spawned the frequentist conception of probability, with its regrettable legacy of confusion and irrationality.

In fact Sherlock Holmes himself explicitly recognizes the importance of inference and rejects the one-sided doctrine of falsification. Here he is in The Adventure of the Cardboard Box (the emphasis is mine):

Let me run over the principal steps. We approached the case, you remember, with an absolutely blank mind, which is always an advantage. We had formed no theories. We were simply there to observe and to draw inferences from our observations. What did we see first? A very placid and respectable lady, who seemed quite innocent of any secret, and a portrait which showed me that she had two younger sisters. It instantly flashed across my mind that the box might have been meant for one of these. I set the idea aside as one which could be disproved or confirmed at our leisure.

My own field of cosmology provides the largest-scale illustration of this process in action. Theorists make postulates about the contents of the Universe and the laws that describe it and try to calculate what measurable consequences their ideas might have. Observers make measurements as best they can, but these are inevitably restricted in number and accuracy by technical considerations. Over the years, theoretical cosmologists deductively explored the possible ways Einstein’s General Theory of Relativity could be applied to the cosmos at large. Eventually a family of theoretical models was constructed, each of which could, in principle, describe a universe with the same basic properties as ours. But determining which, if any, of these models applied to the real thing required more detailed data.  For example, observations of the properties of individual galaxies led to the inferred presence of cosmologically important quantities of  dark matter. Inference also played a key role in establishing the existence of dark energy as a major part of the overall energy budget of the Universe. The result is now that we have now arrived at a standard model of cosmology which accounts pretty well for most relevant data.

Nothing is certain, of course, and this model may well turn out to be flawed in important ways. All the best detective stories have twists in which the favoured theory turns out to be wrong. But although the puzzle isn’t exactly solved, we’ve got good reasons for thinking we’re nearer to at least some of the answers than we were 20 years ago.

I think Sherlock Holmes would have approved.

Kuhn the Irrationalist

Posted in Bad Statistics, The Universe and Stuff with tags , , , , , , , , , , , , on August 19, 2012 by telescoper

There’s an article in today’s Observer marking the 50th anniversary of the publication of Thomas Kuhn’s book The Structure of Scientific Revolutions.  John Naughton, who wrote the piece, claims that this book “changed the way we look at science”. I don’t agree with this view at all, actually. There’s little in Kuhn’s book that isn’t implicit in the writings of Karl Popper and little in Popper’s work that isn’t implicit in the work of a far more important figure in the development of the philosophy of science, David Hume. The key point about all these authors is that they failed to understand the central role played by probability and inductive logic in scientific research. In the following I’ll try to explain how I think it all went wrong. It might help the uninitiated to read an earlier post of mine about the Bayesian interpretation of probability.

It is ironic that the pioneers of probability theory and its application to scientific research, principally Laplace, unquestionably adopted a Bayesian rather than frequentist interpretation for his probabilities. Frequentism arose during the nineteenth century and held sway until relatively recently. I recall giving a conference talk about Bayesian reasoning only to be heckled by the audience with comments about “new-fangled, trendy Bayesian methods”. Nothing could have been less apt. Probability theory pre-dates the rise of sampling theory and all the other frequentist-inspired techniques that many modern-day statisticians like to employ.

Most disturbing of all is the influence that frequentist and other non-Bayesian views of probability have had upon the development of a philosophy of science, which I believe has a strong element of inverse reasoning or inductivism in it. The argument about whether there is a role for this type of thought in science goes back at least as far as Roger Bacon who lived in the 13th Century. Much later the brilliant Scottish empiricist philosopher and enlightenment figure David Hume argued strongly against induction. Most modern anti-inductivists can be traced back to this source. Pierre Duhem has argued that theory and experiment never meet face-to-face because in reality there are hosts of auxiliary assumptions involved in making this comparison. This is nowadays called the Quine-Duhem thesis.

Actually, for a Bayesian this doesn’t pose a logical difficulty at all. All one has to do is set up prior probability distributions for the required parameters, calculate their posterior probabilities and then integrate over those that aren’t related to measurements. This is just an expanded version of the idea of marginalization, explained here.

Rudolf Carnap, a logical positivist, attempted to construct a complete theory of inductive reasoning which bears some relationship to Bayesian thought, but he failed to apply Bayes’ theorem in the correct way. Carnap distinguished between two types or probabilities – logical and factual. Bayesians don’t – and I don’t – think this is necessary. The Bayesian definition seems to me to be quite coherent on its own.

Other philosophers of science reject the notion that inductive reasoning has any epistemological value at all. This anti-inductivist stance, often somewhat misleadingly called deductivist (irrationalist would be a better description) is evident in the thinking of three of the most influential philosophers of science of the last century: Karl Popper, Thomas Kuhn and, most recently, Paul Feyerabend. Regardless of the ferocity of their arguments with each other, these have in common that at the core of their systems of thought likes the rejection of all forms of inductive reasoning. The line of thought that ended in this intellectual cul-de-sac began, as I stated above, with the work of the Scottish empiricist philosopher David Hume. For a thorough analysis of the anti-inductivists mentioned above and their obvious debt to Hume, see David Stove’s book Popper and After: Four Modern Irrationalists. I will just make a few inflammatory remarks here.

Karl Popper really began the modern era of science philosophy with his Logik der Forschung, which was published in 1934. There isn’t really much about (Bayesian) probability theory in this book, which is strange for a work which claims to be about the logic of science. Popper also managed to, on the one hand, accept probability theory (in its frequentist form), but on the other, to reject induction. I find it therefore very hard to make sense of his work at all. It is also clear that, at least outside Britain, Popper is not really taken seriously by many people as a philosopher. Inside Britain it is very different and I’m not at all sure I understand why. Nevertheless, in my experience, most working physicists seem to subscribe to some version of Popper’s basic philosophy.

Among the things Popper has claimed is that all observations are “theory-laden” and that “sense-data, untheoretical items of observation, simply do not exist”. I don’t think it is possible to defend this view, unless one asserts that numbers do not exist. Data are numbers. They can be incorporated in the form of propositions about parameters in any theoretical framework we like. It is of course true that the possibility space is theory-laden. It is a space of theories, after all. Theory does suggest what kinds of experiment should be done and what data is likely to be useful. But data can be used to update probabilities of anything.

Popper has also insisted that science is deductive rather than inductive. Part of this claim is just a semantic confusion. It is necessary at some point to deduce what the measurable consequences of a theory might be before one does any experiments, but that doesn’t mean the whole process of science is deductive. He does, however, reject the basic application of inductive reasoning in updating probabilities in the light of measured data; he asserts that no theory ever becomes more probable when evidence is found in its favour. Every scientific theory begins infinitely improbable, and is doomed to remain so.

Now there is a grain of truth in this, or can be if the space of possibilities is infinite. Standard methods for assigning priors often spread the unit total probability over an infinite space, leading to a prior probability which is formally zero. This is the problem of improper priors. But this is not a killer blow to Bayesianism. Even if the prior is not strictly normalizable, the posterior probability can be. In any case, given sufficient relevant data the cycle of experiment-measurement-update of probability assignment usually soon leaves the prior far behind. Data usually count in the end.

The idea by which Popper is best known is the dogma of falsification. According to this doctrine, a hypothesis is only said to be scientific if it is capable of being proved false. In real science certain “falsehood” and certain “truth” are almost never achieved. Theories are simply more probable or less probable than the alternatives on the market. The idea that experimental scientists struggle through their entire life simply to prove theorists wrong is a very strange one, although I definitely know some experimentalists who chase theories like lions chase gazelles. To a Bayesian, the right criterion is not falsifiability but testability, the ability of the theory to be rendered more or less probable using further data. Nevertheless, scientific theories generally do have untestable components. Any theory has its interpretation, which is the untestable baggage that we need to supply to make it comprehensible to us. But whatever can be tested can be scientific.

Popper’s work on the philosophical ideas that ultimately led to falsificationism began in Vienna, but the approach subsequently gained enormous popularity in western Europe. The American Thomas Kuhn later took up the anti-inductivist baton in his book The Structure of Scientific Revolutions. Initially a physicist, Kuhn undoubtedly became a first-rate historian of science and this book contains many perceptive analyses of episodes in the development of physics. His view of scientific progress is cyclic. It begins with a mass of confused observations and controversial theories, moves into a quiescent phase when one theory has triumphed over the others, and lapses into chaos again when the further testing exposes anomalies in the favoured theory. Kuhn adopted the word paradigm to describe the model that rules during the middle stage,

The history of science is littered with examples of this process, which is why so many scientists find Kuhn’s account in good accord with their experience. But there is a problem when attempts are made to fuse this historical observation into a philosophy based on anti-inductivism. Kuhn claims that we “have to relinquish the notion that changes of paradigm carry scientists ..closer and closer to the truth.” Einstein’s theory of relativity provides a closer fit to a wider range of observations than Newtonian mechanics, but in Kuhn’s view this success counts for nothing.

Paul Feyerabend has extended this anti-inductivist streak to its logical (though irrational) extreme. His approach has been dubbed “epistemological anarchism”, and it is clear that he believed that all theories are equally wrong. He is on record as stating that normal science is a fairytale, and that equal time and resources should be spent on “astrology, acupuncture and witchcraft”. He also categorised science alongside “religion, prostitution, and so on”. His thesis is basically that science is just one of many possible internally consistent views of the world, and that the choice between which of these views to adopt can only be made on socio-political grounds.

Feyerabend’s views could only have flourished in a society deeply disillusioned with science. Of course, many bad things have been done in science’s name, and many social institutions are deeply flawed. One can’t expect anything operated by people to run perfectly. It’s also quite reasonable to argue on ethical grounds which bits of science should be funded and which should not. But the bottom line is that science does have a firm methodological basis which distinguishes it from pseudo-science, the occult and new age silliness. Science is distinguished from other belief-systems by its rigorous application of inductive reasoning and its willingness to subject itself to experimental test. Not all science is done properly, of course, and bad science is as bad as anything.

The Bayesian interpretation of probability leads to a philosophy of science which is essentially epistemological rather than ontological. Probabilities are not “out there” in external reality, but in our minds, representing our imperfect knowledge and understanding. Scientific theories are not absolute truths. Our knowledge of reality is never certain, but we are able to reason consistently about which of our theories provides the best available description of what is known at any given time. If that description fails when more data are gathered, we move on, introducing new elements or abandoning the theory for an alternative. This process could go on forever. There may never be a “final” theory, and scientific truths are consequently far from absolute, but that doesn’t mean that there is no progress.