Physics | In the Dark

Archive for Physics

Scientists in Residence

Posted in Biographical, The Universe and Stuff with tags Cardiff University, education, Monkton Combe School, Physics on February 23, 2010 by telescoper

I’ve managed to get through the hectic first couple of days of what promises to be a very hectic week without feeling too much of the strain, which is quite a pleasant surprise given my advancing senility.

This week a whole bunch of Cardiff astronomers are taking part in a Scientists in Residence scheme at Monkton Combe School which nestles in among the lovely hills in the picturesque countryside near Bath. The idea was to try to give the pupils some sort of idea what it’s like being a scientist – specifically an astronomer – by having an intensive series of teaching sessions run by scientists who visit the school for several days running. A whole range of different types have taken part, from graduate students and postdoctoral researchers all the way down to Professors. Some, in fact, have been staying overnight there too.; it’s a boarding school, in fact.

As with most things these days, I’ve been a bit of a freeloader in this thing – the course materials were prepared by others, principally Chris North, so all I had to do was turn up and lend a hand on the day. Members of the department with duties at Cardiff have only been able to go for part of the time and even that has meant, for me at least, a bit of dashing backwards and forwards on the train. On Monday I had a full complement of meetings, lectures and exercise classes in Cardiff before heading off to Bath to give an evening lecture on The Big Bang to what turned out to be quite a large and attentive audience of sixth-form students. When I finished I had to get the train back to Cardiff – about 70 minute journey – in order to be able to give Columbo his evening insulin fix in good time.

This morning I was up at six to get the train again to Bath – after doing the necessary with Columbo again – in order to take part in a classroom session where we took the students through activities centred around the idea of using the orbital motions of astronomical objects to work out masses. I found this very interesting. On the one hand the students were keen and very easy to interact with, but on the other this experience reinforced the impression that today’s A-level physics students are given a syllabus that is diluted beyond all recognition compared with what older generations of physicists learned. Even in a private school, with excellent laboratoty facilities and highly motivated teachers, it is difficult for todays 16-18 year olds to learn anything meaningful about what physics is really like.

Not having kids of my own, I’ve only observed the changes in educational standards over the last decade indirectly, so this couple of days was a bit of a reality check for me. Unless someone can be persuaded to force schools to teach science properly again, university lecturers will have to carry on doing what is essentially remedial teaching.

Anyway, I’ve found the last couple of days very interesting and I hope the others taking part in the week will enjoy it as much as I did.

You might reasonably ask why a bunch of University academics – mainly funded by the taxpayer – should be running backwards and forwards organizing activities for a posh private school? The mercenary answer is, of course, that some of the kids we’ve been talking to might actually turn into Cardiff undergraduates one day and even if only one does so, the income that generates for the School of Physics & Astronomy more than pays for the number of person-hours we have put in. But even if that doesn’t happen it’s still worth it. Our plan is to offer this type of activity to all kinds of schools in local areas, not only for our own recruitment, but also for the general purpose of “outreach”, communicating an interest in science in the society beyond academia. This week is the first time we’ve done it. Undoubtedly some things will work and others won’t. This week we will iron out some of the problems before we take it on the road to more challenging audiences.

It will need to be a good show if it is to go down well in the Valley Comprehensives, and what better way to improve it than to practice on the rich kids?

1 Comment »

Killing Vectors

Posted in The Universe and Stuff with tags mathematics, matrices, Physics, vectors on February 16, 2010 by telescoper

I’ve been feeling a rant coming for some time now. Since I started teaching again three weeks ago, actually. The target of my vitriol this time is the teaching of Euclidean vectors. Not vectors themselves, of course. I like vectors. They’re great. The trouble is the way we’re forced to write them these days when we use them in introductory level physics classes.

You see, when I was a lad, I was taught to write a geometric vector in the folowing fashion:

$\underline{r} =\left(\begin{array}{c} x \\ y \\ z \end{array} \right).$

This is a simple column vector, where $x,y,z$ are the components in a three-dimensional cartesian coordinate system. Other kinds of vector, such as those representing states in quantum mechanics, or anywhere else where linear algebra is used, can easily be represented in a similar fashion.

This notation is great because it’s very easy to calculate the scalar (dot) and vector (cross) products of two such objects by writing them in column form next to each other and performing a simple bit of manipulation. For example, the scalar product of the two vectors

$\underline{u}=\left(\begin{array}{c} 1 \\ 1 \\ 1 \end{array} \right)$ and $\underline{v}=\left(\begin{array}{c} 1\\ 1 \\ -2 \end{array} \right)$

can easily be found by multiplying the corresponding elements of each together and totting them up:

$\underline{u}\cdot \underline{v} = (1 \times 1) + (1\times 1) + (1\times -2) =0,$

showing immediately that these two vectors are orthogonal. In normalised form, these two particular vectors appear in other contexts in physics, where they have a more abstract interpretation than simple geometry, such as in the representation of the gluon in particle physics.

Moreover, writing vectors like this makes it a lot easier to transform them via the action of a matrix, by multipying rows in the usual fashion, e.g.

$\left(\begin{array}{ccc} \cos \theta & \sin\theta & 0 \\ -\sin\theta & \cos \theta & 0 \\ 0 & 0 & 1\end{array} \right) \left(\begin{array}{c} x \\ y \\ z \end{array} \right) = \left(\begin{array}{c} x\cos \theta + y\sin\theta \\ -x \sin \theta + y\cos \theta \\ z \end{array} \right)$

which corresponds to a rotation of the vector in the $x-y$ plane. Transposing a column vector into a row vector is easy too.

Well, that’s how I was taught to do it.

However, somebody, sometime, decided that, in Britain at least, this concise and computationally helpful notation had to be jettisoned and students instead must be forced to write

$\underline{r} = x \underline{\hat{i}} + y \underline{\hat{j}} + z \underline{\hat{k}}$

Some of you may even be used to doing it that way yourself. Why is this awful? For a start, it’s incredibly clumsy. It is less intuitive, doesn’t lend itself to easy operations on the vectors like I described above, doesn’t translate easily into the more general case of a matrix, and is generally just …well… awful.

Worse still, for the purpose of teaching inexperienced students physics, it offers the possibility of horrible notational confusion. In particular, the unit vector $\underline{\hat{i}}$ is too easily confused with $i$ , the square root of minus one. Introduce a plane wave with a wavevector $\underline{k}$ and it gets even worse, especially when you want to write $\exp(i\underline{k}\cdot\underline{x})$ !

No, give me the row and column notation any day.

I would really like to know is who decided that our schools had to teach the horrible notation, rather than the nice one, and why? I think everyone who teaches physics knows that a clear and user-friendly notation is an enormous help and a bad one is an enormous hindrance. It doesn’t surprise me that some student struggle with even simple mathematics when its presented in such a silly way. On those grounds, I refuse to play ball, and always use the better notation.

Call me old-fashioned.

22 Comments »

A Little Bit of Quantum

Posted in The Universe and Stuff with tags accelerating universe, Cosmology, James Clerk Maxwell, James Jeans, Ludwig Boltzmann, Max Planck, Niels Bohr, Physics, Quantum Mechanics, statistical physics, vacuum energy, Wilhelm Wien on January 16, 2010 by telescoper

I’m trying to avoid getting too depressed by writing about the ongoing funding crisis for physics in the United Kingdom, so by way of a distraction I thought I’d post something about physics itself rather than the way it is being torn apart by short-sighted bureaucrats. A number of Cardiff physics students are currently looking forward (?) to their Quantum Mechanics examinations next week, so I thought I’d try to remind them of what fascinating subject it really is…

The development of the kinetic theory of gases in the latter part of the 19^th Century represented the culmination of a mechanistic approach to Natural Philosophy that had begun with Isaac Newton two centuries earlier. So successful had this programme been by the turn of the 20^th century that it was a fairly common view among scientists of the time that there was virtually nothing important left to be “discovered” in the realm of natural philosophy. All that remained were a few bits and pieces to be tidied up, but nothing could possibly shake the foundations of Newtonian mechanics.

But shake they certainly did. In 1905 the young Albert Einstein – surely the greatest physicist of the 20^th century, if not of all time – single-handedly overthrew the underlying basis of Newton’s world with the introduction of his special theory of relativity. Although it took some time before this theory was tested experimentally and gained widespread acceptance, it blew an enormous hole in the mechanistic conception of the Universe by drastically changing the conceptual underpinning of Newtonian physics. Out were the “commonsense” notions of absolute space and absolute time, and in was a more complex “space-time” whose measurable aspects depended on the frame of reference of the observer.

Relativity, however, was only half the story. Another, perhaps even more radical shake-up was also in train at the same time. Although Einstein played an important role in this advance too, it led to a theory he was never comfortable with: quantum mechanics. A hundred years on, the full implications of this view of nature are still far from understood, so maybe Einstein was correct to be uneasy.

The birth of quantum mechanics partly arose from the developments of kinetic theory and statistical mechanics that I discussed briefly in a previous post. Inspired by such luminaries as James Clerk Maxwell and Ludwig Boltzmann, physicists had inexorably increased the range of phenomena that could be brought within the descriptive framework furnished by Newtonian mechanics and the new modes of statistical analysis that they had founded. Maxwell had also been responsible for another major development in theoretical physics: the unification of electricity and magnetism into a single system known as electromagnetism. Out of this mathematical tour de force came the realisation that light was a form of electromagnetic wave, an oscillation of electric and magnetic fields through apparently empty space. Optical light forms just part of the possible spectrum of electromagnetic radiation, which ranges from very long wavelength radio waves at one end to extremely short wave gamma rays at the other.

With Maxwell’s theory in hand, it became possible to think about how atoms and molecules might exchange energy and reach equilibrium states not just with each other, but with light. Everyday experience that hot things tend to give off radiation and a number of experiments – by Wilhelm Wien and others – had shown that there were well-defined rules that determined what type of radiation (i.e. what wavelength) and how much of it were given off by a body held at a certain temperature. In a nutshell, hotter bodies give off more radiation (proportional to the fourth power of their temperature), and the peak wavelength is shorter for hotter bodies. At room temperature, bodies give off infra-red radiation, stars have surface temperatures measured in thousands of degrees so they give off predominantly optical and ultraviolet light. Our Universe is suffused with microwave radiation corresponding to just a few degrees above absolute zero.

The name given to a body in thermal equilibrium with a bath of radiation is a “black body”, not because it is black – the Sun is quite a good example of a black body and it is not black at all – but because it is simultaneously a perfect absorber and perfect emitter of radiation. In other words, it is a body which is in perfect thermal contact with the light it emits. Surely it would be straightforward to apply classical Maxwell-style statistical reasoning to a black body at some temperature?

It did indeed turn out to be straightforward, but the result was a catastrophe. One can see the nature of the disaster very straightforwardly by taking a simple idea from classical kinetic theory. In many circumstances there is a “rule of thumb” that applies to systems in thermal equilibrium. Roughly speaking, the idea is that energy becomes divided equally between every possible “degree of freedom” the system possesses. For example, if a box of gas consists of particles that can move in three dimensions then, on average, each component of the velocity of a particle will carry the same amount of kinetic energy. Molecules are able to rotate and vibrate as well as move about inside the box, and the equipartition rule can apply to these modes too.

Maxwell had shown that light was essentially a kind of vibration, so it appeared obvious that what one had to do was to assign the same amount of energy to each possible vibrational degree of freedom of the ambient electromagnetic field. Lord Rayleigh and Sir James Jeans did this calculation and found that the amount of energy radiated by a black body as a function of wavelength should vary proportionally to the temperature T and to inversely as the fourth power of the wavelength λ, as shown in the diagram for an example temperature of 5000K:

Even without doing any detailed experiments it is clear that this result just has to be nonsense. The Rayleigh-Jeans law predicts that even very cold bodies should produce infinite amounts of radiation at infinitely short wavelengths, i.e. in the ultraviolet. It also predicts that the total amount of radiation – the area under the curve in the above figure – is infinite. Even a very cold body should emit infinitely intense electromagnetic radiation. Infinity is bad.

Experiments show that the Rayleigh-Jeans law does work at very long wavelengths but in reality the radiation reaches a maximum (at a wavelength that depends on the temperature) and then declines at short wavelengths, as shown also in the above Figure. Clearly something is very badly wrong with the reasoning here, although it works so well for atoms and molecules.

It wouldn’t be accurate to say that physicists all stopped in their tracks because of this difficulty. It is amazing the extent to which people are able to carry on despite the presence of obvious flaws in their theory. It takes a great mind to realise when everyone else is on the wrong track, and a considerable time for revolutionary changes to become accepted. In the meantime, the run-of-the-mill scientist tends to carry on regardless.

The resolution of this particular fundamental conundrum is accredited to Karl Ernst Ludwig “Max” Planck (right), who was born in 1858. He was the son of a law professor, and himself went to university at Berlin and Munich, receiving his doctorate in 1880. He became professor at Kiel in 1885, and moved to Berlin in 1888. In 1930 he became president of the Kaiser Wilhelm Institute, but resigned in 1937 in protest at the behaviour of the Nazis towards Jewish scientists. His life was blighted by family tragedies: his second son died in the First World War; both daughters died in childbirth; and his first son was executed in 1944 for his part in a plot to assassinate Adolf Hitler. After the Second World War the institute was named the Max Planck Institute, and Planck was reappointed director. He died in 1947; by then such a famous scientist that his likeness appeared on the two Deutschmark coin issued in 1958.

Planck had taken some ideas from Boltzmann’s work but applied them in a radically new way. The essence of his reasoning was that the ultraviolet catastrophe basically arises because Maxwell’s electromagnetic field is a continuous thing and, as such, appears to have an infinite variety of ways in which it can absorb energy. When you are allowed to store energy in whatever way you like in all these modes, and add them all together you get an infinite power output. But what if there was some fundamental limitation in the way that an atom could exchange energy with the radiation field? If such a transfer can only occur in discrete lumps or quanta – rather like “atoms” of radiation – then one could eliminate the ultraviolet catastrophe at a stroke. Planck’s genius was to realize this, and the formula he proposed contains a constant that still bears his name. The energy of a light quantum E is related to its frequency ν via E=hν, where h is Planck’s constant, one of the fundamental constants that occur throughout theoretical physics.

Boltzmann had shown that if a system possesses a discrete energy state labelled by j separated by energy E_j then at a given temperature the likely relative occupation of the two states is determined by a “Boltzmann factor” of the form:

$n_{j} \propto \exp\left(-\frac{E_{j}}{k_BT}\right),$

so that the higher energy state is exponentially less probable than the lower energy state if the energy difference is much larger than the typical thermal energy k_B T ; the quantity k_B is Boltzmann’s constant, another fundamental constant. On the other hand, if the states are very close in energy compared to the thermal level then they will be roughly equally populated in accordance with the “equipartition” idea I mentioned above.

The trouble with the classical treatment of an electromagnetic field is that it makes it too easy for the field to store infinite energy in short wavelength oscillations: it can put a little bit of energy in each of a lot of modes in an unlimited way. Planck realised that his idea would mean ultra-violet radiation could only be emitted in very energetic quanta, rather than in lots of little bits. Building on Boltzmann’s reasoning, he deduced the probability of exciting a quantum with very high energy is exponentially suppressed. This in turn leads to an exponential cut-off in the black-body curve at short wavelengths. Triumphantly, he was able to calculate the exact form of the black-body curve expected in his theory: it matches the Rayleigh-Jeans form at long wavelengths, but turns over and decreases at short wavelengths just as the measurements require. The theoretical Planck curve matches measurements perfectly over the entire range of wavelengths that experiments have been able to probe.

Curiously perhaps, Planck stopped short of the modern interpretation of this: that light (and other electromagnetic radiation) is composed of particles which we now call photons. He was still wedded to Maxwell’s description of light as a wave phenomenon, so he preferred to think of the exchange of energy as being quantised rather than the radiation itself. Einstein’s work on the photoelectric effect in 1905 further vindicated Planck, but also demonstrated that light travelled in packets. After Planck’s work, and the development of the quantum theory of the atom pioneered by Niels Bohr, quantum theory really began to take hold of the physics community and eventually it became acceptable to conceive of not just photons but all matter as being part particle and part wave. Photons are examples of a kind of particle known as a boson, and the atomic constituents such as electrons and protons are fermions. (This classification arises from their spin: bosons have spin which is an integer multiple of Planck’s constant, whereas fermions have half-integral spin.)

You might have expected that the radical step made by Planck would immediately have led to a drastic overhaul of the system of thermodynamics put in place in the preceding half-a-century, but you would be wrong. In many ways the realization that discrete energy levels were involved in the microscopic description of matter if anything made thermodynamics easier to understand and apply. Statistical reasoning is usually most difficult when the space of possibilities is complicated. In quantum theory one always deals fundamentally with a discrete space of possible outcomes. Counting discrete things is not always easy, but it’s usually easier than counting continuous things. Even when they’re infinite.

Much of modern physics research lies in the arena of condensed matter physics, which deals with the properties of solids and gases, often at the very low temperatures where quantum effects become important. The statistical thermodynamics of these systems is based on a very slight modification of Boltzmann’s result:

$n_{j} \propto \left[\exp\left(\frac{E_{j}}{k_BT}\right)\pm 1\right]^{-1},$

which gives the equilibrium occupation of states at an energy level E_j; the difference between bosons and fermions manifests itself as the sign in the denominator. Fermions take the upper “plus” sign, and the resulting statistical framework is based on the so-called Fermi-Dirac distribution; bosons have the minus sign and obey Bose-Einstein statistics. This modification of the classical theory of Maxwell and Boltzmann is simple, but leads to a range of fascinating phenomena, from neutron stars to superconductivity.

Moreover, the nature the ultraviolet catastrophe for black-body radiation at the start of the 20th Century perhaps also holds lessons for modern physics. One of the fundamental problems we have in theoretical cosmology is how to calculate the energy density of the vacuum using quantum field theory. This is a more complicated thing to do than working out the energy in an electromagnetic field, but the net result is a catastrophe of the same sort. All straightforward ways of computing this quantity produce a divergent answer unless a high-energy cut off is introduced. Although cosmological observations of the accelerating universe suggest that vacuum energy is there, its actual energy density is way too small for any plausible cutoff.

So there we are. A hundred years on, we have another nasty infinity. It’s a fundamental problem, but its answer will probably open up a new way of understanding the Universe.

9 Comments »

A Little Bit of Chaos

Posted in The Universe and Stuff with tags astronomy, Carl Heiles, Chaos, Laplace, Michel Henon, phase space, Physics, Sir Isaac Newton, Sir William Hamilton on November 21, 2009 by telescoper

The era of modern physics could be said to have begun in 1687 with the publication by Sir Isaac Newton of his great Philosophiae Naturalis Principia Mathematica, (Principia for short). In this magnificent volume, Newton presented a mathematical theory of all known forms of motion and, for the first time, gave clear definitions of the concepts of force and momentum. Within this general framework he derived a new theory of Universal Gravitation and used it to explain the properties of planetary orbits previously discovered but unexplained by Johannes Kepler. The classical laws of motion and his famous “inverse square law” of gravity have been superseded by more complete theories when dealing with very high speeds or very strong gravity, but they nevertheless continue supply a very accurate description of our everyday physical world.

Newton’s laws have a rigidly deterministic structure. What I mean by this is that, given precise information about the state of a system at some time then one can use Newtonian mechanics to calculate the precise state of the system at any later time. The orbits of the planets, the positions of stars in the sky, and the occurrence of eclipses can all be predicted to very high accuracy using this theory.

At this point it is useful to mention that most physicists do not use Newton’s laws in the form presented in the Principia, but in a more elegant language named after Sir William Rowan Hamilton. The point about Newton’s laws of motion is that they are expressed mathematically as differential equations: they are expressed in terms of rates of changes of things. For instance, the force on a body gives the rate of change of the momentum of the body. Generally speaking, differential equations are very nasty things to solve which is a shame because most a great deal of theoretical physics involves them. Hamilton realised that it was possible to express Newton’s laws in a way that did not involve clumsy mathematics of this type. His formalism was equivalent, in the sense that one could obtain the basic differential equations from it, but easier to use in general situations. The key concept he introduced – now called the Hamiltonian – is a single mathematical function that depends on both the positions q and momenta p of the particles in a system, say H(q,p). This function is constructed from the different forms of energy (kinetic and potential) in the system, and how they depend on the p’s and q’s, but the details of how this works out don’t matter. Suffice to say that knowing the Hamiltonian for a system is tantamount to a full classical description of its behaviour.

Hamilton was a very interesting character. He was born in Dublin in 1805 and showed an astonishing early flair for languages, speaking 13 of them by the time he was 13. He graduated from Trinity College aged 22, at which point he was clearly a whiz-kid at mathematics as well as languages. He was immediately made professor of astronomy at Dublin and Astronomer Royal for Ireland. However, he turned out to be hopeless at the practicalities of observational work. Despite employing three of his sisters to help him in the observatory he never produced much of astronomical interest. Mathematics and alcohol seem to have been the two real loves of his life.

It is a fascinating historical fact that the development of probability theory during the late 17^th and early 18^th century coincided almost exactly with the rise of Newtonian Mechanics. It may seem strange in retrospect that there was no great philosophical conflict between these two great intellectual achievements since they have mutually incompatible views of prediction. Probability applies in unpredictable situations; Newtonian Mechanics says that everything is predictable. The resolution of this conundrum may owe a great deal to Laplace, who contributed greatly to both fields. Laplace, more than any other individual, was responsible to elevated the deterministic world-view of Newton to a scientific principle in its own right. To quote:

We ought then to regard the present state of the Universe as the effect of its preceding state and as the cause of its succeeding state.

According to Laplace’s view, knowledge of the initial conditions pertaining at the instant of creation would be sufficient in order to predict everything that subsequently happened. For him, a probabilistic treatment of phenomena did not conflict with classical theory, but was simply a convenient approach to be taken when the equations of motion were too difficult to be solved exactly. The required probabilities could be derived from the underlying theory, perhaps using some kind of symmetry argument.

The s-called “randomizing” devices used in all traditional gambling games – roulette wheels, dice, coins, bingo machines, and so on – are in fact well described by Newtonian mechanics. We call them “random” because the motions involved are just too complicated to make accurate prediction possible. Nevertheless it is clear that they are just straightforward mechanical devices which are essentially deterministic. On the other hand, we like to think the weather is predictable, at least in principle, but with much less evidence that it is so!

But it is not only systems with large numbers of interacting particles (like the Earth’s atmosphere) that pose problems for predictability. Some deceptively simple systems display extremely erratic behaviour. The theory of these systems is less than fifty years old or so, and it goes under the general title of nonlinear dynamics. One of the most important landmarks in this field was a study by two astronomers, Michel Hénon and Carl Heiles in 1964. They were interested in what would happens if you take a system with a known analytical solutions and modify it.

In the language of Hamiltonians, let us assume that H₀ describes a system whose evolution we know exactly and H₁ is some perturbation to it. The Hamiltonian of the modified system is thus

$H(q_i,p_i)=H_0(q_i, p_i) + H_1 (q_i, p_i)$

What Hénon and Heiles did was to study a system whose unmodified form is very familiar to physicists: the simple harmonic oscillator. This is a system which, when displaced from its equilibrium, experiences a restoring force proportional to the displacement. The Hamiltonian description for a single simple harmonic oscillator system involves a function that is quadratic in both p and q:

$H=\frac{1}{2} \left( q_1^2+p_1^2\right)$

The solution of this system is well known: the general form is a sinusoidal motion and it is used in the description of all kinds of wave phenomena, swinging pendulums and so on.

The case Henon and Heiles looked at had two degrees of freedom, so that the Hamiltonian depends on q₁, q₂, p₁ and p_2:

$H=\frac{1}{2} \left( q_1^2+p_1^2 + q_2^2+p_2^2\right)$

However, in this example, the two degrees of freedom are independent, meaning that there is uncoupled motion in the two directions. The amplitude of the oscillations is governed by the total energy of the system, which is a constant of the motion. Other than this, the type of behaviour displayed by this system is very rich, as exemplified by the various Lissajous figures shown in the diagram below. Note that all these figures are produced by the same type of dynamical system of equations: the different shapes are consequences of different initial conditions and different coefficients (which I set to unity in the form above).

If the oscillations in each direction have the same frequency then one can get an orbit which is a line or an ellipse. If the frequencies differ then the orbits can be much more complicated, but still pretty. Note that in all these cases the orbit is just a line, i.e. a one-dimensional part of the two-dimensional space drawn on the paper.

More generally, one can think of this system as a point moving in a four-dimensional phase space defined by the coordinates q₁, q₂, p₁ and p₂; taking slices through this space reveals qualitatively similar types of orbit for, say, p₂ and q₂ as for p₁ and p₂. The motion of the system is confined to a lower-dimensional part of the phase space rather than filling up all the available phase space. In this particular case, because each degree of freedom moves in only one of its two available dimensions, the system as a whole moves in a two-dimensional part of the four-dimensional space.

This all applies to the original, unperturbed system. Hénon and Heiles took this simple model and modified by adding a term to the Hamiltonian that was cubic rather than quadratic and which coupled the two degrees of freedom together. For those of you interested in the details their Hamiltonian was of the form

$H=\frac{1}{2} \left( q_1^2+p_1^2 + q_2^2+p_2^2\right) +q_1^2q_2+ \frac{1}{3}q_2^3$

The first set of terms in the brackets is the unmodified form, describing a simple harmonic oscillator; the other two terms are new. The result of this simple alteration is really quite surprising. They found that, for low energies, the system continued to behave like two uncoupled oscillators; the orbits were smooth and well-behaved. This is not surprising because the cubic modifications are smaller than the original quadratic terms if the amplitude is small. For higher energies the motion becomes a bit more complicated, but the phase space behaviour is still characterized by continuous lines, as shown in the left hand part of the following figure.

However, at higher values of the energy (right), the cubic terms become more important, and something very striking happens. A two-dimensional slice through the phase space no longer shows the continuous curves that typify the original system, but a seemingly disorganized scattering of dots. It is not possible to discern any pattern in the phase space structure of this system: it appear to be random.

Nowadays we describe the transition from these two types of behaviour as being accompanied by the onset of chaos. It is important to note that this system is entirely deterministic, but it generates a phase space pattern that is quite different from what one would naively expect from the behaviour usually associated with classical Hamiltonian systems. To understand how this comes about it is perhaps helpful to think about predictability in classical systems. It is true that precise knowledge of the state of a system allows one to predict its state at some future time. For a single particle this means that precise knowledge of its position and momentum, and knowledge of the relevant H, will allow one to calculate the position and momentum at all future times.

But think a moment about what this means. What do we mean by precise knowledge of the particle’s position? How precise? How many decimal places? If one has to give the position exactly then that could require an infinite amount of information. Clearly we never have that much information. Everything we know about the physical world has to be coarse-grained to some extent, even if it is only limited by measurement error. Strict determinism in the form advocated by Laplace is clearly a fantasy. Determinism is not the same as predictability.

In “simple” Hamiltonian systems what happens is that two neighbouring phase-space paths separate from each other in a very controlled way as the system evolves. In fact the separation between paths usually grows proportionally to time. The coarse-graining with which the input conditions are specified thus leads to a similar level of coarse-graining in the output state. Effectively the system is predictable, since the uncertainty in the output is not much larger than in the input.

In the chaotic system things are very different. What happens here is that the non-linear interactions represented in the Hamiltonian play havoc with the initial coarse-graining. Phase-space orbits that start out close to each other separate extremely violently (typically exponentially) and in a way that varies from one part of the phase space to another. What happens then is that particle paths become hopelessly scrambled and the mapping between initial and final states becomes too complex to handle. What comes out the end is practically impossible to predict.

5 Comments »

The Academic Journal Racket

Posted in Open Access, Science Politics with tags Academic Journals, arXiv, astronomy, Institute of Physics, Physics, Publishing on November 18, 2009 by telescoper

I’ve had this potential rant simmering away at the back of my mind for a while now, since our last staff meeting to be precise. In common, I suspect, with many other physics and astronomy departments, here at Cardiff we’re bracing ourselves for an extended period of budget cuts to help pay for our government’s charitable donations of taxpayer’s money to the banking sector.

English universities are currently making preparations for a minimum 10% reduction in core funding, and many are already making significant numbers of redundancies. We don’t know what’s going to happen to us here in Wales yet, but I suspect it will be very bad indeed.

Anyway, one of the items of expenditure that has been identified as a source of savings as we try to tighten our collective belts is the cost of academic journals. I nearly choked when the Head of School revealed how much we spend per annum on some of the journal subscriptions for physics and astronomy. In fact, I think university and departmental libraries are being taken to the cleaners by the academic publishing industry and it’s time to make a stand.

Let me single out one example. Like many learned societies, the Institute of Physics (the professional organisation for British physicists) basically operates like a charity. It does, however, have an independent publishing company that is run as a profit-making enterprise. And how.

In 2009 we paid almost £30K (yes, THIRTY THOUSAND POUNDS) for a year’s subscription to the IOP Physics package, a bundled collection of mainstream physics journals. This does not include Classical and Quantum Gravity or the Astrophysical Journal (both of which I have published in occasionally) which require additional payments running into thousands of pounds.

The IOP is not the only learned society to play this game. The Royal Astronomical Society also has a journal universally known as MNRAS (Monthly Notices of the Royal Astronomical Society) which earns it a considerable amount of revenue from its annual subscription of over £4K per department. Indeed, I don’t think it is inaccurate to say that without the income from MNRAS the RAS itself would face financial oblivion. I dare say MNRAS also earns a tidy sum for its publisher Wiley…

If you’re not already shocked by the cost of these subscriptions, let me outline the way academic journal business works, at least in the fields of physics and astronomy. I hope then you’ll agree that we’re being taken to the cleaners.

First, there is the content. This consists of scientific papers submitted to the journal by researchers, usually (though not exclusively) university employees. If the paper is accepted for publication the author receives no fee whatsoever and in some cases even has to pay “page charges” for the privilege of seeing the paper in print. In return for no fee, the author also has to sign over the copyright for the manuscript to the publisher. This is entirely different from the commercial magazine market, where contributors are usually paid a fee for writing a piece, or book publishing, where authors get a royalty on sales (and sometimes an advance).

Next there is the editorial process. The purpose of an academic journal – if there is one – is to ensure that only high quality papers are published. To this end it engages a Board of Editors to oversee this aspect of its work. The Editors are again usually academics and, with a few exceptions, they undertake the work on an unpaid basis. When a paper arrives at the journal which lies within the area of expertise of a particular editor, he or she identifies one or more suitable referees drawn from the academic community to provide advice on whether to publish it. The referees are expected to read the paper and provide comments as well as detailed suggestions for changes. The fee for referees? You guess it. Zilch. Nada.

The final part of the business plan is to sell the content (supplied for free), suitably edited (for free) and refereed (for free) back to the universities paying the wages of the people who so generously donated their labour. Not just sell, of course, but sell at a grossly inflated price.

Just to summarise, then: academics write the papers, do the refereeing and provide the editorial oversight for free and we then buy back the product of our labours at an astronomical price. Why do we participate in this ridiculous system? Am I the only one who detects the whiff of rip-off? Isn’t it obvious that we (I mean academics in universities) are spending a huge amout of time and money achieving nothing apart from lining the pockets of these exploitative publishers?

And if it wasn’t bad enough, there’s also the matter of inflation. There used to be a myth that advances in technology should lead to cheaper publishing.Nowadays authors submit their manuscripts electronically, they are sent electronically to referees and they are typset automatically if and when accepted. Most academics now access journals online rather than through paper copies; in fact some publications are only published electronically these days. All this may well lead to cheaper publishing but it doesn’t lead to cheaper subscriptions. The forecast inflation rate for physics journals over this year is about 8.5%, way above the Retail Price Index, which is currently negative.

Where is all the money going? Right into the pockets of the journal publishers. Times are tough enough in the university sector without us giving tens of thousands of pounds per year, plus free editoral advice and the rest, to these rapacious companies. Enough is enough.

It seems to me that it would be a very easy matter to get rid of academic journals entirely (at least from the areas of physics and astronomy that I work in). For a start, we have an excellent free repository (the arXiv) where virtually every new research paper is submitted. There is simply no reason why we should have to pay for journal subscriptions when papers are publically available there. In the old days, the journal industry had to exist in order for far flung corners of the world to have access to the latest research. Now everyone with an internet connection can get it all. Journals are redundant.

The one thing the arXiv does not do is provide editorial control, which some people argue is why we have to carry on being fleeced in the way I have described. If there is no quality imprint from an established journal how else would researchers know which papers to read? There is a lot of dross out there.

For one thing, not all referees put much effort into their work so there’s a lot of dross in refereed journals anyway. And, frustratingly, many referees sit on papers for months on end before sending in a report that’s only a couple of sentences. Far better, I would say, to put the paper on the arXiv and let others comment on it, either in private with the authors or perhaps each arXiv entry should have a comments facility, like a blog, so that the paper could be discussed interactively. The internet is pushing us in a direction in which the research literature should be discussed much more openly than it is at present, and in which it evolves much more as a result of criticisms and debate.

Finally, the yardstick by which research output is now being measured – or at least one of the metrics – is not so much a count of the number of refereed papers, but the number of citations the papers have attracted. Papers begin to attract citations – through the arXiv – long before they appear in a refereed journal and good papers get cited regardless of where they are eventually published.

If you look at citation statistics for refereed journals you will find it very instructive. A sizeable fraction of papers published in the professional literature receive no citations at all in their lifetime. So we end up paying over the odds for papers that nobody even bothers to read. Madness.

It could be possible for the arXiv (or some future version of it) to have its own editorial system, with referees asked to vet papers voluntarily. I’d be much happier giving my time in this way for a non-profit making system than I am knowing that I’m aiding and abetting racketeers. However, I think I probably prefer the more libertarian solution. Put it all on the net with minimal editorial control and the good stuff will float to the top regardless of how much crud there is.

Anyway, to get back to the starting point of this post, we have decided to cancel a large chunk of our journal subscriptions, including the IOP Physics package which is costing us an amount close to the annual salary of a lecturer. As more and more departments decide not to participate in this racket, no doubt the publishers will respond by hiking the price for the remaining customers. But it seems to me that this lunacy will eventually have to come to an end.

And if the UK university sector has to choose over the next few years between sacking hundreds of academic staff and ditching its voluntary subsidy to the publishing industry, I know what I would pick…

Follow @telescoper

85 Comments »

Ergodic Means…

Posted in The Universe and Stuff with tags anthropic principle, Cosmology, ergodic hypothesis, multiverse, Physics, probability, statistical physics on October 19, 2009 by telescoper

The topic of this post is something I’ve been wondering about for quite a while. This afternoon I had half an hour spare after a quick lunch so I thought I’d look it up and see what I could find.

The word ergodic is one you will come across very frequently in the literature of statistical physics, and in cosmology it also appears in discussions of the analysis of the large-scale structure of the Universe. I’ve long been puzzled as to where it comes from and what it actually means. Turning to the excellent Oxford English Dictionary Online, I found the answer to the first of these questions. Well, sort of. Under etymology we have

ad. G. ergoden (L. Boltzmann 1887, in Jrnl. f. d. reine und angewandte Math. C. 208), f. Gr.

I say “sort of” because it does attribute the origin of the word to Ludwig Boltzmann, but the greek roots (εργον and οδοσ) appear to suggest it means “workway” or something like that. I don’t think I follow an ergodic path on my way to work so it remains a little mysterious.

The actual definitions of ergodic given by the OED are

Of a trajectory in a confined portion of space: having the property that in the limit all points of the space will be included in the trajectory with equal frequency. Of a stochastic process: having the property that the probability of any state can be estimated from a single sufficiently extensive realization, independently of initial conditions; statistically stationary.

As I had expected, it has two meanings which are related, but which apply in different contexts. The first is to do with paths or orbits, although in physics this is usually taken to meantrajectories in phase space (including both positions and velocities) rather than just three-dimensional position space. However, I don’t think the OED has got it right in saying that the system visits all positions with equal frequency. I think an ergodic path is one that must visit all positions within a given volume of phase space rather than being confined to a lower-dimensional piece of that space. For example, the path of a planet under the inverse-square law of gravity around the Sun is confined to a one-dimensional ellipse. If the force law is modified by external perturbations then the path need not be as regular as this, in extreme cases wandering around in such a way that it never joins back on itself but eventually visits all accessible locations. As far as my understanding goes, however, it doesn’t have to visit them all with equal frequency. The ergodic property of orbits is intimately associated with the presence of chaotic dynamical behaviour.

The other definition relates to stochastic processes, i.e processes involving some sort of random component. These could either consist of a discrete collection of random variables {X₁…X_n} (which may or may not be correlated with each other) or a continuously fluctuating function of some parameter such as time t, i.e. X(t) or spatial position (or perhaps both).

Stochastic processes are quite complicated measure-valued mathematical entities because they are specified by probability distributions. What the ergodic hypothesis means in the second sense is that measurements extracted from a single realization of such a process have a definition relationship to analagous quantities defined by the probability distribution.

I always think of a stochastic process being like a kind of algorithm (whose workings we don’t know). Put it on a computer, press “go” and it spits out a sequence of numbers. The ergodic hypothesis means that by examining a sufficiently long run of the output we could learn something about the properties of the algorithm.

An alternative way of thinking about this for those of you of a frequentist disposition is that the probability average is taken over some sort of statistical ensemble of possible realizations produced by the algorithm, and this must match the appropriate long-term average taken over one realization.

This is actually quite a deep concept and it can apply (or not) in various degrees. A simple example is to do with properties of the mean value. Given a single run of the program over some long time T we can compute the sample average

$\bar{X}_T\equiv \frac{1}{T} \int_0^Tx(t) dt$

the probability average is defined differently over the probability distribution, which we can call p(x)

$\langle X \rangle \equiv \int x p(x) dx$

If these two are equal for sufficiently long runs, i.e. as T goes to infinity, then the process is said to be ergodic in the mean. A process could, however, be ergodic in the mean but not ergodic with respect to some other property of the distribution, such as the variance. Strict ergodicity would require that the entire frequency distribution defined from a long run should match the probability distribution to some accuracy.

Now we have a problem with the OED again. According to the defining quotation given above, ergodic can be taken to mean statistically stationary. Actually that’s not true. ..

In the one-parameter case, “statistically stationary” means that the probability distribution controlling the process is independent of time, i.e. that p(x,t)=p(x,t+Δt) . It’s fairly straightforward to see that the ergodic property requires that a process X(t) be stationary, but the converse is not the case. Not every stationary process is necessarily ergodic. Ned Wright gives an example here. For a higher-dimensional process, such as a spatially-fluctuating random field the analogous property is statistical homogeneity, rather than stationarity, but otherwise everything carries over.

Ergodic theorems are very tricky to prove in general, but there are well-known results that rigorously establish the ergodic properties of Gaussian processes (which is another reason why theorists like myself like them so much). However, it should be mentioned that even if the ergodic assumption applies its usefulness depends critically on the rate of convergence. In the time-dependent example I gave above, it’s no good if the averaging period required is much longer than the age of the Universe; in that case even ergodicity makes it difficult to make inferences from your sample. Likewise the ergodic hypothesis doesn’t help you analyse your galaxy redshift survey if the averaging scale needed is larger than the depth of the sample.

Moreover, it seems to me that many physicists resort to ergodicity when there isn’t any compelling mathematical grounds reason to think that it is true. In some versions of the multiverse scenario, it is hypothesized that the fundamental constants of nature describing our low-energy turn out “randomly” to take on different values in different domains owing to some sort of spontaneous symmetry breaking perhaps associated a phase transition generating cosmic inflation. We happen to live in a patch within this structure where the constants are such as to make human life possible. There’s no need to assert that the laws of physics have been designed to make us possible if this is the case, as most of the multiverse doesn’t have the fine tuning that appears to be required to allow our existence.

As an application of the Weak Anthropic Principle, I have no objection to this argument. However, behind this idea lies the assertion that all possible vacuum configurations (and all related physical constants) do arise ergodically. I’ve never seen anything resembling a proof that this is the case. Moreover, there are many examples of physical phase transitions for which the ergodic hypothesis is known not to apply. If there is a rigorous proof that this works out, I’d love to hear about it. In the meantime, I remain sceptical.

5 Comments »

Alarm Bells at STFC

Posted in Science Politics with tags astronomy, Physics, STFC on September 30, 2009 by telescoper

The financial catastrophe engulfing the Science and Technology Facilities Council (STFC) has suddenly reared its (very ugly) head again.

Here is a statement posted yesterday on their webpage.

STFC Council policy on grants

STFC Council examined progress of its current science and technology prioritisation exercise at a strategy session on 21 and 22 September. Without prejudging the outcome of the prioritisation, Council agreed that prudent financial management required a re-examination of upcoming grants.

Council therefore agreed that new grants will be issued only to October 2010 in the first instance. This temporary policy is in place pending the outcome of the prioritisation exercise, expected in the New Year.

According to the e-astronomer the STFC has written to all Vice-chancellors and Principals of UK universities to tell them about this move. I gather the intention is that this measure will be temporary, but it looks deeply ominous to me. Those of us whose rolling grant requests for 5 years from April 2010 are currently being assessed face the possibility of receiving grants for only 6 months of funding. On the other hand, I’m told that what is more likely is that our grant won’t be announced until January or February, after the hitlist prioritisation exercise has been completed in the New Year. Hardest hit will be the particle physicists whose rolling grants start on 1st October 2009 (tomorrow), which will have only a year’s funding on them…

It seems that STFC has finally realised the scale of its budgetary problems and payback time is looming. I honestly think we could be doomed…

6 Comments »

Index Rerum

Posted in Biographical, Science Politics with tags astronomy, Astrophysics, bibliometric, citations, Cosmology, Ed Witten, g-index, h-index, Physics, research impact on September 29, 2009 by telescoper

Following on from yesterday’s post about the forthcoming Research Excellence Framework that plans to use citations as a measure of research quality, I thought I would have a little rant on the subject of bibliometrics.

Recently one particular measure of scientific productivity has established itself as the norm for assessing job applications, grant proposals and for other related tasks. This is called the h-index, named after the physicist Jorge Hirsch, who introduced it in a paper in 2005. This is quite a simple index to define and to calculate (given an appropriately accurate bibliographic database). The definition is that an individual has an h-index of h if that individual has published h papers with at least h citations. If the author has published N papers in total then the other N-h must have no more than h citations. This is a bit like the Eddington number. A citation, as if you didn’t know, is basically an occurrence of that paper in the reference list of another paper.

To calculate it is easy. You just go to the appropriate database – such as the NASA ADS system – search for all papers with a given author and request the results to be returned sorted by decreasing citation count. You scan down the list until the number of citations falls below the position in the ordered list.

Incidentally, one of the issues here is whether to count only refereed journal publications or all articles (including books and conference proceedings). The argument in favour of the former is that the latter are often of lower quality. I think that is in illogical argument because good papers will get cited wherever they are published. Related to this is the fact that some people would like to count “high-impact” journals only, but if you’ve chosen citations as your measure of quality the choice of journal is irrelevant. Indeed a paper that is highly cited despite being in a lesser journal should if anything be given a higher weight than one with the same number of citations published in, e.g., Nature. Of course it’s just a matter of time before the hideously overpriced academic journals run by the publishing mafia go out of business anyway so before long this question will simply vanish.

The h-index has some advantages over more obvious measures, such as the average number of citations, as it is not skewed by one or two publications with enormous numbers of hits. It also, at least to some extent, represents both quantity and quality in a single number. For whatever reasons in recent times h has undoubtedly become common currency (at least in physics and astronomy) as being a quick and easy measure of a person’s scientific oomph.

Incidentally, it has been claimed that this index can be fitted well by a formula h ~ sqrt(T)/2 where T is the total number of citations. This works in my case. If it works for everyone, doesn’t it mean that h is actually of no more use than T in assessing research productivity?

Typical values of h vary enormously from field to field – even within each discipline – and vary a lot between observational and theoretical researchers. In extragalactic astronomy, for example, you might expect a good established observer to have an h-index around 40 or more whereas some other branches of astronomy have much lower citation rates. The top dogs in the field of cosmology are all theorists, though. People like Carlos Frenk, George Efstathiou, and Martin Rees all have very high h-indices. At the extreme end of the scale, string theorist Ed Witten is in the citation stratosphere with an h-index well over a hundred.

I was tempted to put up examples of individuals’ h-numbers but decided instead just to illustrate things with my own. That way the only person to get embarrased is me. My own index value is modest – to say the least – at a meagre 27 (according to ADS). Does that mean Ed Witten is four times the scientist I am? Of course not. He’s much better than that. So how exactly should one use h as an actual metric, for allocating funds or prioritising job applications, and what are the likely pitfalls? I don’t know the answer to the first one, but I have some suggestions for other metrics that avoid some of its shortcomings.

One of these addresses an obvious deficiency of h. Suppose we have an individual who writes one brilliant paper that gets 100 citations and another who is one author amongst 100 on another paper that has the same impact. In terms of total citations, both papers register the same value, but there’s no question in my mind that the first case deserves more credit. One remedy is to normalise the citations of each paper by the number of authors, essentially sharing citations equally between all those that contributed to the paper. This is quite easy to do on ADS also, and in my case it gives a value of 19. Trying the same thing on various other astronomers, astrophysicists and cosmologists reveals that the h index of an observer is likely to reduce by a factor of 3-4 when calculated in this way – whereas theorists (who generally work in smaller groups) suffer less. I imagine Ed Witten’s index doesn’t change much when calculated on a normalized basis, although I haven’t calculated it myself.

Observers complain that this normalized measure is unfair to them, but I’ve yet to hear a reasoned argument as to why this is so. I don’t see why 100 people should get the same credit for a single piece of work: it seems like obvious overcounting to me.

Another possibility – if you want to measure leadership too – is to calculate the h index using only those papers on which the individual concerned is the first author. This is a bit more of a fiddle to do but mine comes out as 20 when done in this way. This is considerably higher than most of my professorial colleagues even though my raw h value is smaller. Using first author papers only is also probably a good way of identifying lurkers: people who add themselves to any paper they can get their hands on but never take the lead. Mentioning no names of course. I propose using the ratio of unnormalized to normalized h-indices as an appropriate lurker detector…

Finally in this list of bibliometrica is the so-called g-index. This is defined in a slightly more complicated way than h: given a set of articles ranked in decreasing order of citation numbers, g is defined to be the largest number such that the top g articles altogether received at least g² citations. This is a bit like h but takes extra account of the average citations of the top papers. My own g-index is about 47. Obviously I like this one because my number looks bigger, but I’m pretty confident others go up even more than mine!

Of course you can play with these things to your heart’s content, combining ideas from each definition: the normalized g-factor, for example. The message is, though, that although h definitely contains some information, any attempt to condense such complicated information into a single number is never going to be entirely successful.

Comments, particularly with suggestions of alternative metrics are welcome via the box. Even from lurkers.

26 Comments »

Cosmic Haiku

Posted in Poetry, The Universe and Stuff with tags astronomy, Cosmology, Haiku, Physics on September 6, 2009 by telescoper

I haven’t had much time to post today and will probably be too busy next week for anything too substantial, so I thought I’d resort to a bit of audience participation. How about a few Haiku on themes connected to astronomy, cosmology or physics?

Don’t be worried about making the style of your contributions too authentic, just make sure they are 17 syllables in total, and split into three lines of 5, 7 and 5 syllables respectively.

Here’s a few of my own to give you an idea!

Quantum Gravity:
The troublesome double-act
Of Little and Large

Gravity’s waves are
Traceless; which does not mean they
Can never be found

The Big Bang wasn’t
So big, at least not when you
Think in decibels.

Cosmological
Constant and Dark Energy
Are vacuous names

Microwave Background
Photons remember a time
When they were hotter

Isotropic and
Homogeneous metric?
Robertson-Walker

Galaxies evolve
In a complicated way
We don’t understand

Acceleration:
Type Ia Supernovae
Gave us the first clue

Cosmic Inflation
Could have stretched the Universe
And made it flatter

Astrophysicist
Is what I’m told is my Job
Title. Whatever.

Contributions welcome via the comments box. The best one gets a chance to win Bully’s star prize.

12 Comments »

The Inductive Detective

Posted in Bad Statistics, Literature, The Universe and Stuff with tags A Study in Scarlet, astronomy, David Hume, Karl Popper, Physics, probability, Science, Sherlock Holmes on September 4, 2009 by telescoper

I was watching an old episode of Sherlock Holmes last night – from the classic Granada TV series featuring Jeremy Brett’s brilliant (and splendidly camp) portrayal of the eponymous detective. One of the things that fascinates me about these and other detective stories is how often they use the word “deduction” to describe the logical methods involved in solving a crime.

As a matter of fact, what Holmes generally uses is not really deduction at all, but inference (a process which is predominantly inductive).

In deductive reasoning, one tries to tease out the logical consequences of a premise; the resulting conclusions are, generally speaking, more specific than the premise. “If these are the general rules, what are the consequences for this particular situation?” is the kind of question one can answer using deduction.

The kind of reasoning of reasoning Holmes employs, however, is essentially opposite to this. The question being answered is of the form: “From a particular set of observations, what can we infer about the more general circumstances that relating to them?”. The following example from a Study in Scarlet is exactly of this type:

From a drop of water a logician could infer the possibility of an Atlantic or a Niagara without having seen or heard of one or the other.

The word “possibility” makes it clear that no certainty is attached to the actual existence of either the Atlantic or Niagara, but the implication is that observations of (and perhaps experiments on) a single water drop could allow one to infer sufficient of the general properties of water in order to use them to deduce the possible existence of other phenomena. The fundamental process is inductive rather than deductive, although deductions do play a role once general rules have been established.

In the example quoted there is an inductive step between the water drop and the general physical and chemical properties of water and then a deductive step that shows that these laws could describe the Atlantic Ocean. Deduction involves going from theoretical axioms to observations whereas induction is the reverse process.

I’m probably labouring this distinction, but the main point of doing so is that a great deal of science is fundamentally inferential and, as a consequence, it entails dealing with inferences (or guesses or conjectures) that are inherently uncertain as to their application to real facts. Dealing with these uncertain aspects requires a more general kind of logic than the simple Boolean form employed in deductive reasoning. This side of the scientific method is sadly neglected in most approaches to science education.

In physics, the attitude is usually to establish the rules (“the laws of physics”) as axioms (though perhaps giving some experimental justification). Students are then taught to solve problems which generally involve working out particular consequences of these laws. This is all deductive. I’ve got nothing against this as it is what a great deal of theoretical research in physics is actually like, it forms an essential part of the training of an physicist.

However, one of the aims of physics – especially fundamental physics – is to try to establish what the laws of nature actually are from observations of particular outcomes. It would be simplistic to say that this was entirely inductive in character. Sometimes deduction plays an important role in scientific discoveries. For example, Albert Einstein deduced his Special Theory of Relativity from a postulate that the speed of light was constant for all observers in uniform relative motion. However, the motivation for this entire chain of reasoning arose from previous studies of eletromagnetism which involved a complicated interplay between experiment and theory that eventually led to Maxwell’s equations. Deduction and induction are both involved at some level in a kind of dialectical relationship.

The synthesis of the two approaches requires an evaluation of the evidence the data provides concerning the different theories. This evidence is rarely conclusive, so a wider range of logical possibilities than “true” or “false” needs to be accommodated. Fortunately, there is a quantitative and logically rigorous way of doing this. It is called Bayesian probability. In this way of reasoning, the probability (a number between 0 and 1 attached to a hypothesis, model, or anything that can be described as a logical proposition of some sort) represents the extent to which a given set of data supports the given hypothesis. The calculus of probabilities only reduces to Boolean algebra when the probabilities of all hypothesese involved are either unity (certainly true) or zero (certainly false). In between “true” and “false” there are varying degrees of “uncertain” represented by a number between 0 and 1, i.e. the probability.

Overlooking the importance of inductive reasoning has led to numerous pathological developments that have hindered the growth of science. One example is the widespread and remarkably naive devotion that many scientists have towards the philosophy of the anti-inductivist Karl Popper; his doctrine of falsifiability has led to an unhealthy neglect of an essential fact of probabilistic reasoning, namely that data can make theories more probable. More generally, the rise of the empiricist philosophical tradition that stems from David Hume (another anti-inductivist) spawned the frequentist conception of probability, with its regrettable legacy of confusion and irrationality.

My own field of cosmology provides the largest-scale illustration of this process in action. Theorists make postulates about the contents of the Universe and the laws that describe it and try to calculate what measurable consequences their ideas might have. Observers make measurements as best they can, but these are inevitably restricted in number and accuracy by technical considerations. Over the years, theoretical cosmologists deductively explored the possible ways Einstein’s General Theory of Relativity could be applied to the cosmos at large. Eventually a family of theoretical models was constructed, each of which could, in principle, describe a universe with the same basic properties as ours. But determining which, if any, of these models applied to the real thing required more detailed data. For example, observations of the properties of individual galaxies led to the inferred presence of cosmologically important quantities of dark matter. Inference also played a key role in establishing the existence of dark energy as a major part of the overall energy budget of the Universe. The result is now that we have now arrived at a standard model of cosmology which accounts pretty well for most relevant data.

Nothing is certain, of course, and this model may well turn out to be flawed in important ways. All the best detective stories have twists in which the favoured theory turns out to be wrong. But although the puzzle isn’t exactly solved, we’ve got good reasons for thinking we’re nearer to at least some of the answers than we were 20 years ago.

I think Sherlock Holmes would have approved.

15 Comments »

In the Dark