9.6 Von Neumann's Postulate and Bell’s Freedom

If I have freedom in my love,

And in my soul am free,

Angels alone, that soar above,

Enjoy such liberty.

Richard Lovelace, 1649

In quantum mechanics the condition of a physical system is represented by a state vector, which encodes the probabilities of each possible result of whatever measurements we may perform on the system. Since the probabilities are usually neither 0 nor 1, it follows that for a given system with a specific state vector, the results of measurements generally are not uniquely determined. Instead, there is a set (or range) of possible results, each with a specific probability. Furthermore, according to the conventional interpretation of quantum mechanics (the so-called Copenhagen Interpretation of Niels Bohr, et al), the state vector is the most complete possible description of the system, which implies that nature is fundamentally probabilistic (i.e., non-deterministic). However, some physicists have questioned whether this interpretation is correct, and whether there might be some more complete description of a system, such that a fully specified system would respond deterministically to any measurement we might perform. Such proposals are called 'hidden variable' theories.

In his assessment of hidden variable theories in 1932, John von Neumann pointed out a set of five assumptions which, if we accept them, imply that no hidden variable theory can possibly give deterministic results for all measurements. The first four of these assumptions are fairly unobjectionable, but the fifth seems much more arbitrary, and has been the subject of much discussion. (The parallel with Euclid's postulates, including the controversial fifth postulate discussed in Chapter 3.1, is striking.) To understand von Neumann's fifth postulate, notice that although the conventional interpretation does not uniquely determine the outcome of a particular measurement for a given state, it does predict a unique 'expected value' for that measurement. Let's say a measurement of X on a system with a state vector ϕ has an expected value denoted by <X;ϕ>, computed by simply adding up all the possible results multiplied by their respective probabilities. Not surprisingly, the expected values of observables are additive, in the sense that

In practice we can't generally perform a measurement of X+Y without disturbing the measurements of X and Y, so we can't measure all three observables on the same system. However, if we prepare a set of systems, all with the same initial state vector ϕ, and perform measurements of X+Y on some of them, and measurements of X or Y on the others, then the averages of the measured values of X, Y, and X+Y (over sufficiently many systems) will be related in accord with (1).

Remember that according to the conventional interpretation the state vector ϕ is the most complete possible description of the system. On the other hand, in a hidden variable theory the premise is that there are additional variables, and if we specify both the state vector ϕ AND the "hidden vector" H, the result of measuring X on the system is uniquely determined. In other words, if we let <X;ϕ,H> denote the expected value of a measurement of X on a system in the state (ϕ,H), then the claim of the hidden variable theorist is that the variance of individual measured values around this expected value is zero.

Now we come to von Neumann's controversial fifth postulate. He assumed that, for any hidden variable theory, just as in the conventional interpretation, the averages of X+Y, X and Y evaluated over a set of identical systems are additive. (Compare this with Galileo's assumption of simple additivity for the composition of incommensurate speeds.) Symbolically, this is expressed as

for any two observables X and Y. On this basis he proved that the variance ("dispersion") of at least one observable's measurements must be greater than zero. (Technically, he showed that there must be an observable X such that <X²> is not equal to <X>².) Thus, no hidden variable theory can uniquely determine the results of all possible measurements, and we are compelled to accept that nature is fundamentally non-deterministic.

However, this is all based on (2), the assumption of additivity for the expectations of identically prepared systems, so it's important to understand exactly what this assumption means. Clearly the words "identically prepared" mean something different under the conventional interpretation than they do in the context of a hidden variable theory. Conventionally, two systems are said to be identically prepared if they have the same state vector (ϕ), but in a hidden variable theory two states with the same state vector are not necessarily "identical", because they may have different hidden vectors (H).

Of course, a successful hidden variable theory must satisfy (1) (which has been experimentally verified), but must it necessarily satisfy (2)? Relation (1) implies that the averages of <X;ϕ,H>, etc, evaluated over all applicable hidden vectors H, leads to (1), but does it necessarily follow that (2) is satisfied for every (or even for ANY) specific value of H? To give a simple illustration, consider the following trivial set of data:

The averages over these four "conventionally indistinguishable" systems are <X;3> = 3, <Y;3> = 4, and <X+Y;3> = 7, so relation (1) holds. However, if we examine the "identically prepared" systems taking into account the hidden components of the state, we really have two different states (those with H=1 and those with H=2), and we find that the results are not additive (but they are deterministic) in these fully-defined states. Thus, equation (1) clearly doesn't imply equation (2). (If it did, von Neumann could have said so, rather than taking it as an axiom.)

Of course, if our hidden variable theory is always going to satisfy (1), we must have some constraints on the values of H that arise among "conventionally indistinguishable" systems. For example, in the above table if we happened to get a sequence of systems all in the same condition as System #1 we would always get the results X=2, Y=5, X+Y=5, which would violate (1). So, if (2) doesn't hold, then at the very least we need our theory to ensure a distribution of the hidden variables H that will make the average results over a set of "conventionally indistinguishable" systems satisfy relation (1). (In the simple illustration above, we would just need to ensure that the hidden variables are equally distributed between H=1 and H=2.)

In Bohm's 1952 theory the hidden variables consist of precise initial positions for the particles in the system – more precise than the uncertainty relations would typically allow us to determine - and the distribution of those variables within the uncertainty limits is governed as a function of the conventional state vector, ϕ. It's also worth noting that, in order to make the theory work, it was necessary for ϕ to be related to the values of H for separate particles instantaneously in an explicitly non-local way. Thus, Bohm's theory is a counter-example to von Neumann's theorem, but not to Bell's (see below).

Incidentally, it may be worth noting that if a hidden variable theory is valid, and the variance of all measurements around their expectations are zero, then the terms of (2) are not only the expectations, they are the unique results of measurements for a given ϕ and H. This implies that they are eigenvalues, of the respective operators, whereas the expectations for those operators are generally not equal to any of the eigenvalues. Thus, as Bell remarked, "[von Neumann's] 'very general and plausible postulate' is absurd".

Still, Gleason showed that we can carry through von Neumann's proof even on the weaker assumption that (2) applies to commuting variables. This weakened assumption has the advantage of not being self-evidently false. However, careful examination of Gleason's proof reveals that the non-zero variances again arise only because of the existence of non-commuting observables, but this time in a "contextual" sense that may not be obvious at first glance. To illustrate, consider three observables X,Y,Z. If X and Y commute and X and Z commute, it doesn't follow that Y and Z commute. We may be able to measure X and Y using one setup, and X and Z using another, but measuring the value of X and Y simultaneously will disturb the value of Z. Gleason's proof leads to non-zero variances precisely for measurements in such non-commuting contexts. It's not hard to understand this, because in a sense the entire non-classical content of quantum mechanics is the fact that some observables do not commute. Thus it's inevitable that any "proof" of the inherent non-classicality of quantum mechanics must at some point invoke non-commuting measurements, but it's precisely at that point where linear additivity can only be empirically verified on an average basis, not a specific basis. This, in turn, leaves the door open for hidden variables to govern the individual results.

Notice that in a "contextual" theory the result of an experiment is understood to depend not only on the deterministic state of the "test particles" but also on the state of the experimental apparatus used to make the measurements, and these two can influence each other. Thus, Bohm's 1952 theory escaped the no hidden variable theorems essentially by allowing the measurements to have an instantaneous effect on the hidden variables, which, of course, made the theory essentially non-local as well as non-relativistic (although Bohm and others later worked to relativize his theory).

Ironically, the importance of considering the entire experimental setup (rather than just the arbitrarily identified "test particles") was emphasized by Niels Bohr himself, and it's a fundamental feature of quantum mechanics (i.e., objects are influenced by measurements no less than measurements are influenced by objects). As Bell said, even Gleason's relatively robust line of reasoning overlooks this basic insight. Of course, it can be argued that contextual theories are somewhat contrived and not entirely compatible with the spirit of hidden variable explanations, but, if nothing else, they serve to illustrate how difficult it is to categorically rule out "all possible" hidden variable theories based simply on the structure of the quantum mechanical state space.

In 1963 John Bell sought to clarify matters, noting that all previous attempts to prove the impossibility of hidden variable interpretations of quantum mechanics had been “found wanting”. His idea was to establish rigorous limits on the kinds of statistical correlations that could possibly exist between spatially separate events under the assumption of determinism and what might be called “local realism”, which he took to be the premises of Einstein, et al. At first Bell thought he had succeeded, but it was soon pointed out that his derivation implicitly assumed one other crucial ingredient, namely, the possibility of free choice. To see why this is necessary, notice that any two spatially separate events share a common causal past, consisting of the intersection of their past light cones. This implies that we can never categorically rule out some kind of "pre-arranged" correlation between spacelike-separated events - at least not unless we can introduce information that is guaranteed to be causally independent of prior events. The appearance of such "new events" whose information content is at least partially independent of their causal past, constitutes a free choice. If no free choice is ever possible, then (as Bell acknowledged) the Bell inequalities do not apply.

In summary, Bell showed that quantum mechanics is incompatible with a quite peculiar pair of assumptions, the first being that the future behavior of some particles (i.e., the "entangled" pairs) involved in the experiment is mutually conditioned and coordinated in advance, and the second being that such advance coordination is in principle impossible for other particles involved in the experiment (e.g., the measuring apparatus). These are not quite each others' logical negations, but close to it. One is tempted to suggest that the mention of quantum mechanics is almost superfluous, because Bell's result essentially amounts to a proof that the assumption of a strictly deterministic universe is incompatible with the assumption of a strictly non-deterministic universe. He proved, assuming the predictions of quantum mechanics are valid (which the experimental evidence strongly supports), that not all events can be strictly consequences of their causal pasts, and in order to carry out this proof he found it necessary to introduce the assumption that not all events are strictly consequences of their causal pasts!

Bell identified three possible positions (aside from “just ignore it”) that he thought could be taken with respect to the Aspect experiments: (1) detector inefficiencies are keeping us from seeing that the inequalities are not really violated, (2) there are influences going faster than light, or (3) the measuring angles are not free variables. Regarding the third possibility, he wrote:

...if our measurements are not independently variable as we supposed...even if chosen by apparently free-willed physicists... then Einstein local causality can survive. But apparently separate parts of the world become deeply entangled, and our apparent free will is entangled with them.

The third possibility clearly shows that Bell understood the necessity of assuming free acausal events for his derivation, but since this amounts to assuming precisely that which he was trying to prove, we must acknowledge that the significance of Bell's inequalities is less clear than many people originally believed. In effect, after clarifying the lack of significance of von Neumann's "no hidden variables proof" due to its assumption of what it meant to prove, Bell proceeded to repeat the mistake, albeit in a more subtle way. Perhaps Bell's most perspicacious remark was (in reference to Von Neumann's proof) that the only thing proved by impossibility proofs is the author's lack of imagination.

This all just illustrates that it's extremely difficult to think clearly about causation, and the reasons for this can be traced back to the Aristotelian distinction between natural and violent motion. Natural motion consisted of the motions of non-living objects, such as the motions of celestial objects, the natural flows of water and wind, etc. These are the kinds of motion that people (like Bell) apparently have in mind when they think of determinism. Following the ancients, many people tend to instinctively exempt "violent motions" – i.e., motions resulting from acts of living volition – when considering determinism. In fact, when Bell contemplated the possibility that determinism might also apply to himself and other living beings, he coined a different name for it, calling it “super-determinism”. Regarding the experimental tests of quantum entanglement he said

One of the ways of understanding this business is to say that the world is super-deterministic. That not only is inanimate nature deterministic, but we, the experimenters who imagine we can choose to do one experiment rather than another, are also determined. If so, the difficulty which this experimental result creates disappears.

But what Bell calls (admittedly on the spur of the moment) super-determinism is nothing other than what philosophers have always called simply determinism. Ironically, if confronted with the idea of vitalism, i.e., the notion that living beings are exempt from the normal laws of physics that apply to inanimate objects, or at least that living beings also entail some other kind of action transcending the normal laws of physics in physically observable ways – many physicists would probably be skeptical if not downright dismissive… and yet hardly any would think to question this very dualistic assumption underlying Bell’s analysis. Regardless of our conscious beliefs, it's psychologically very difficult for us to avoid bifurcating the world into inanimate objects that obey strict laws of causality, and animate objects (like ourselves) that do not. This dichotomy was historically appealing, and may even have been necessary for the development of classical physics, but it always left the nagging question of how or why we (and our constituent atoms) manage to evade the iron hand of determinism that governs everything else. This view affects our conception of science by suggesting to us that the experimenter is not himself part of nature, and is exempt from whatever determinism is postulated for the system being studied. Thus we imagine that we can "test" whether the universe is behaving deterministically by turning some dials and seeing how the universe responds, overlooking the fact that we and the dials are also part of the universe.

This immediately introduces "the measurement problem": Where do we draw the boundaries between separate phenomena? What is an observation? How do we distinguish "nature" from "violence", and is this distinction even warranted? When people say they're talking about a deterministic world, they're almost always not. What they're usually talking about is a deterministic sub-set of the world that can be subjected to freely chosen inputs from a non-deterministic "exterior". But just as with the measurement problem in quantum mechanics, when we think we've figured out the constraints on how a deterministic test apparatus can behave in response to arbitrary inputs, someone says "but isn't the whole lab a deterministic system?", and then the whole building, and so on. At what point does "the collapse of determinism" occur, so that we can introduce free inputs to test the system? Just as the infinite regress of the measurement problem in quantum mechanics leads to bewilderment, so too does the infinite regress of determinism.

The other loop-hole that can never be closed is what Bell called "correlation by post-arrangement" or "backwards causality". I'd prefer to say that the system may violate the assumption of strong temporal asymmetry, but the point is the same. Clearly the causal pasts of the spacelike separated arms of an EPR experiment overlap, so all the objects involved share a common causal past. Therefore, without something to "block off" this region of common past from the emission and absorption events in the EPR experiment, we're not justified in asserting causal independence, which is required for Bell's derivation. The usual and, as far as I know, only way of blocking off the causal past is by injecting some "other" influence, i.e., an influence other than the deterministic effects propagating from the causal past. This "other" may be true randomness, free will, or some other concept of "free occurrence". In any case, Bell's derivation requires us to assert that each measurement is a "free" action, independent of the causal past, which is inconsistent with even the most limited construal of determinism.

There is a fascinating parallel between the ancient concepts of natural and violent motion and the modern quantum mechanical concepts of the linear evolution of the wave function and the collapse of the wave function. These modern concepts are sometimes termed U, for unitary evolution of the quantum mechanical state vector, and R, for reduction of the state vector onto a particular basis of measurement or observation. One could argue that the U process corresponds closely with Aristotle's natural (inanimate) evolution, while the R process represents Aristotle's violent evolution, triggered by some living act. As always, we face the question of whether this is an accurate or meaningful bifurcation of events. Today there are several "non-collapse" interpretations of quantum mechanics, including the famous "many worlds" interpretation of Everett and DeWit. However, to date, none of these interpretations has succeeded in giving a completely satisfactory account of quantum mechanical processes, so we are not yet able to dispense with Aristotle's distinction between natural and violent motion.

Return to Table of Contents