The Dirac Equation 

Quantum mechanics is based on a correspondence principle that maps classical dynamical variables to differential operators. From the classical equation of motion for a given object, expressed in terms of energy E and momentum p, the corresponding wave equation of quantum mechanics is given by making the replacements 

_{} 

and then treating the resulting expression as a differential operator on the wave function of the object. For example, recall that the nonrelativistic momentum components of a particle are p_{x} = mv_{x}, etc., and the kinetic energy is mv^{2}/2 = p^{2}/(2m), so the equation of motion for a free particle (i.e., no potential energy) is 

_{} 

where E is the total energy. Making the replacements noted above, and applying the resulting operators to the wave function y of the particle, this gives 

_{} 

which is the nonrelativistic Schrödinger equation of a free particle of mass m. This equation is valid only if the speed of the particle is small compared with the speed of light, because it was based on the nonrelativistic expression (1) for the energy. To cover relativistic speeds we must use the relativistic relation between energy and momentum, which is E^{2} = m^{2} + p^{2}. Thus we have 

_{} 

If we were to replace E in this equation with the operator ih(∂/∂t), the resulting equation of motion would involve the second time derivative of the wave function, in contrast with the nonrelativistic Schrödinger equation, in which only the first time derivative of the wave function appears. In his book “The Principles of Quantum Mechanics” Dirac wrote that “we deduced from quite general arguments that the wave equation must be linear in the operator ∂/∂t”, and that an equation of motion involving the second time derivative would not be “of the form required by the general laws of the quantum theory”. The detailed justification of this statement involved the need for the probability density of the wave to be always positive. In later years Dirac described his motivations differently, explaining that he hadn’t pursued the quadratic form (leading to what is now called the KleinGordon equation) because it seemed inconsistent with his work on “transformation theory” based on the firstorder time derivative. 

I think [the transformation theory] is the piece of work which has most pleased me of all the works I’ve done in my life… [it] had become my darling. I was not interested in considering any theory which would not fit in with my darling. Therefore, the linearity of ∂/∂t was absolutely essential to me; I just couldn’t face giving up the transformation theory. 

Whatever the motivation, Dirac sought a wave equation whose solutions would be solutions of (2), but that was linear in E. His approach was to hypothesize that (2) can be expressed as the product of “conjugate” linear factor. Specifically, he postulated a set of basis variables g_{0}, g_{1}, g_{2}, and g_{3} (not necessarily commuting) such that 

_{} 

Expanding the product and collecting terms, we find that this is a valid equality if and only if the four variables g_{j} satisfy the relations 

_{} 

for all i,j = 0,1,2,3 with i ≠ j. These four quantities, along with unity, form the basis of what is called a Clifford algebra, after William Clifford, who investigated such mathematical structures in the late 1800s. (Dirac was unaware of this history.) It’s easy to see that any product of two or more of these entities can be reduced to a unique signed product of zero, one, two, three, or all four of the them with indices in increasing order. For example, we have the identity 

_{} 

which is found by repeated transpositions of neighboring entities in the left hand string, reversing the sign with each transposition, and consolidating squared entities using the idempotence relations noted above. Thus every product of two or more of the four entities is equivalent to one of the 16 signed expressions 

_{} 

These may be regarded as the “units” of this algebraic structure, similar to the two signed units 1 and i of ordinary complex numbers. Obviously each of these 16 expressions, when squared, reduces to either +1 or 1. Within this algebraic structure the quadratic relativistic equation relating energy, momentum, and mass can be factored as noted above, and the full equation will be satisfied if either of the factors vanishes. Focusing on the factor with positive mass, this gives the condition 

_{} 

Making the usual quantization substitutions for E and p, dividing through by i and hbar, and applying the resulting expression as an operator on a wave function y, Dirac arrived at the equation 

_{} 

for a free particle of mass m. We might be content at this point, having found a wave equation with coefficients involving some nonreal basis variables, just as the original Schrödinger equation involves the imaginary variable i. However, unlike the ordinary complex numbers, the multiplication of these new basis variables g_{j} is not commutative, a fact which suggests some underlying structure. (The articulation of noncommuting entities into structures of commuting entities is a useful heuristic principle, although it is rarely mentioned explicitly.) Accordingly, Dirac sought a representation of these basis variables in terms of complex numbers. He found that the g_{j} variables can be represented by 4x4 matrices with complex elements, with the understanding that the symbols “1” and “0” in equation (6) represent the identity matrix and the null matrix respectively. For example, the following matrices satisfy all the requirements: 

_{} 

These can be used to generate the 16 “units” given by (4), and then every 4x4 matrix with complex elements can be expressed as a linear combination of those 16 matrices. Since the operator is now a 4x4 matrix, the wave equation (6) is a matrix equation, which implies that the wave function of this particle must actually have four components, so it can be expressed as the vector 

_{} 

Thus, in a sense, we must consider four distinct versions of the particle. However, the elements of the g matrices are not independent, as shown by the fact that the g matrices listed above can be written in the form 

_{} 

where I is the 2x2 identity matrix and s_{x}, s_{y}, s_{z} are the 2x2 Pauli spin matrices 

_{} 

These are called “spin” matrices because they are characterized by the relations 

_{} 

which are analogous to the relations for the components of angular momentum of a classical particle. Expressing the vector of wave functions as a twodimensional vector of twodimensional vectors f_{a} and f_{b} by 

_{} 

we can write Dirac’s wave equation explicitly as 

_{} 

Carrying out the matrix multiplications, this represents the following two equations 

_{} 

As a check, by substituting the expression for f_{b} from the second equation into the first, and simplifying by making use of the properties of the spin matrices, we can verify that the twodimensional vector f_{a} satisfies the KleinGordon equation 

_{} 

which we recall is simply the quantized version of the relativistic energymomentum equation E^{2} – p^{2} = m^{2}. Likewise we can show that the two dimensional vector f_{b} satisfies the very same equation, and therefore (as required) each of the four components y_{1}, y_{2}, y_{3}, y_{4} of the original fourdimensional vector of wave functions individually satisfies the KleinGordon equation. (Equations (8) and (9) can be seen as a generalization of the CauchyRiemann conditions for analyticity, with f_{a} and f_{b} being analogous to conjugate harmonic functions.) However, these equations also show that the four components of y are not independent, because given any solution f_{a} of (10) we can compute f_{b} using (9). These wave functions will then automatically satisfy (8) as well. Therefore, either f_{a} by itself or f_{b} by itself is sufficient to determine the complete wave function y for a given basis. Also, comparing equations (8) and (9), we see that f_{a} and f_{b} are symmetrical except that the signs of the Pauli spin matrices are reversed. Thus a particle described by Dirac’s equation has just two possible intrinsic states relative to a given basis, corresponding to the lefthanded and righthanded spin states of the particle for that basis. 

One useful way of expressing the wave function of such a particle is as a linear combination of two mutually exclusive (i.e., “orthogonal”) components. Notice that, of the 16 “unit” matrices identified in (3), only the matrix given by g_{0}g_{1}g_{2}g_{3} anticommutes with all four of the generators. For convenience we will give this special unit matrix, multiplied by i (which of course doesn’t affect the anticommutative properties), the special name g_{5}, which is to say, we define 

_{} 

The factor of i is included so that g_{5}^{2} = I, which will prove to be convenient below. Since this matrix anticommutes with each of the generators, it follows that multiplying through equation (6) by g_{5} gives 

_{} 

Therefore, if y is a solution of Dirac’s equation (6), it follows that g_{5}y is also a solution, but with the momentum and the energy negated relative to the sign of the mass. Alternatively we can say that g_{5}y is a solution of the “negative mass” version of Dirac’s equation, i.e., 

_{} 

Recall that this corresponds to the other factor of the equation E^{2 } p^{2 } m^{2} = 0, so solutions of this equation are, strictly speaking, equally valid solutions of the KleinGordon equation from which we began. Of course if m = 0 there is no distinction between the two factors. Even for cases when m is not zero, the distinction between the two factors may be extremely small if the energy and momentum are extremely large, i.e., for a particle moving at close to the speed of light. 

Now, since for any given solution y, the wave functions Iy and g_{5}y are also solutions, and since the Dirac equation is linear, any linear combination of solutions is also a solution. Therefore, if we define the matrices P_{L} = (I  g_{5})/2 and P_{R} = (I + g_{5})/2, we know that P_{L}y and P_{R}y are both solutions, which we will call y_{L} and y_{R} respectively. Remembering that g_{5}^{2} = I, it’s easy to show that the P_{L} and P_{R} matrices have the following properties 

_{} 

so they can be regarded as a complete set of projection operators. Furthermore, based on the generators (7), the matrix g_{5} is diagonal, and we have 

_{} 

Therefore these projection operators resolve the full wave function for a given particle into two parts, namely y = y_{L} + y_{R}, where the nonzero parts of y_{L} and y_{R} are just the twodimensional vectors f_{a} and f_{b} discussed previously, i.e., 

_{} 

As explained previously, f_{a} and f_{b} are the same except for having opposite intrinsic spin. The fact that particles satisfying the Dirac equation (such as electrons) have two distinct states of quantum spin is highly consequential, because it accounts for the valency properties of atoms (each quantum “orbit” can be occupied by two electrons with opposite spins), which makes possible the whole variety of chemical interactions in nature. 


This set of g matrices discussed above is not unique. For example, if we replace the original matrix g_{0} in (7) with the matrix g_{5}, we get the equally satisfactory set of generators 

_{} 

This basis is convenient for dealing with stationary or slowmoving particles. To see why, notice that the analogs of equations (8) and (9) for this basis are 

_{} 

where f_{a} and f_{b} are the twodimensional vector functions discussed previously. Now, for a stationary particle, the wave function y should be independent of time, except for possibly an unobservable phase advance, which is to say, the wave function can be factored into a spatial part and a unit complex temporal part as follows 

_{} 

for some real constant w, where f is a fourdimensional vector of spatial functions. Thus we can write 

_{} 

where f_{a} and f_{b} are twodimensional spatial wave functions. Substituting for these vectors in equations (12) and rearranging terms, we get 

_{} 

It follows that f_{a} and f_{b} must each satisfy a relation of the form 

_{} 

Thus in order for f_{a} and f_{b} to be harmonic functions, the quantity in parentheses in this equation must vanish, so choosing the positive frequency solution, and noting that (since the particle in this case is assumed to be stationary or nearly so) we have E = m, we find that the phase angular speed w for a stationary particle must be related to the massenergy by 

_{} 

With this condition the preceding equations then reduce to 

_{} 

We expect both of these “spin divergences” to vanish, so the second equation requires us to set f_{b} = 0. Hence, with this basis applied to stationary (or approximately to slowly moving) particles, we have f_{b} = 0 and 

_{} 

An abbreviated version of this approach is to note that, for a stationary particle (i.e., with zero momentum), equation (5) reduces to g_{0}’E – Im = 0. (Note that we are using the primed g basis.) Making the usual quantum substitution for E, the corresponding Dirac equation is simply 

_{} 

As before, we express the wave function of a stationary particle in the form (13). Making this substitution into the above equation and simplifying, we get 

_{} 

Inserting the g_{0}’ matrix and writing this equation explicitly, we have 

_{} 

This shows that if w is not equal to _{} then f must be identically zero. The only nonvanishing solution is with w equal to _{}. Assuming a positive frequency, the preceding equation becomes 

_{} 

Therefore, we find (again) that f_{b} must vanish, and f_{a} is of the form 

_{} 

The two components of f_{a} (i.e., f_{1} and f_{2}) consist of the two spin states of the particle. 

In the preceding discussion we’ve mentioned two different sets of g matrices, defined in (7) and (11), that each represent a satisfactory basis for evaluating Dirac’s equation. These are called the Weyl basis and the Dirac basis respectively. As noted above, the Weyl basis is convenient for highspeed electrons (for which the momentum is much larger than the rest mass), and the Dirac basis is convenient for stationary or lowspeed particles. 

However, these are by no means the only two satisfactory sets of basis matrices. In fact, it’s easy to verify that if g_{i}, i = 0,1,2,3 is a set of matrices that satisfy all the requirements of Dirac’s equation, then for any invertible 4x4 matrix m the matrices given by mg_{j}m^{1} for j = 0,1,2,3 also satisfy all the requirements. This is vitally important for achieving one of Dirac’s main objectives, which was to find a Lorentz invariant wave equation for a particle. It isn’t obvious that a linear equation of the kind sought by Dirac (as opposed to, say, the KleinGordon equation) could ever be relativistic, but remarkably it turns out that Dirac’s equations actually is invariant under Lorentz transformations, provided we take into account the transformation properties of the g matrices. 

To simplify the notation, let x^{0}, x^{1}, x^{2}, x^{3} denote the coordinates t, x, y, z respectively (where the superscripts are indices, not exponents). Also, let the contravariant and covariant components of a given 4vector be denoted with superscripted and subscripted indices respectively, and let the fundamental metric tensor of (flat) spacetime be denoted by 

_{} 

Recall that the contravariant and covariant components of any 4vector “a” are related by 

_{} 

where now we adopt the convention that summation over any repeated index in a given term is implied. The Lorentzinvariant scalar product of two 4vectors “a” and “b” is then expressed as 

_{} 

The differential operators ∂/∂x^{m} transform as the components of a covariant 4vector, so for convenience we will denote them by ∂_{m}. , and using the summation convention we can rewrite the Dirac equation (6) as 

_{} 

Of course, the g matrices are not 4vectors, so the first term inside the parentheses of this equation is not a scalar, it is the sum of four matrices, each multiplied by one of the ∂_{m} scalar components. Our use of superscripts here instead of subscripts for the g matrices is just for typographical consistency. An infinitesimal Lorentz transformation of the column vector ∂_{m} can be written (up to the first order) as 

_{} 

where L is a 4x4 matrix whose elements are infinitesimal quantities of the first order. Note that the components of L are antisymmetric, which follows directly from the fact that the scalar product is invariant, i.e., we have 

_{} 

To the first order this reduces to 

_{} 

so we have L^{mn} + L^{nm} = 0. 

Solving the four linear equations (15) for the ∂_{m} , and again omitting second order terms in the infinitesimal matrix L, gives the reciprocal relation 

_{} 

Substituting into (14) gives 

_{} 

By simply changing the order of summation of the first term, we have the identity 

_{} 

so equation (16) can be written as 

_{} 

Now suppose we define another 4x4 matrix, which we will call M, whose elements (like those of L) are infinitesimal quantities of the first order, and are such that 

_{} 

for m = 0,1,2,3. These four equations uniquely determine the matrix M (as will be shown below). If we now multiply through equation (17) on the left by the matrix I+M we get 

_{} 

Expanding the coefficient of ∂_{m}‘ in this expression gives 

_{} 

The last term contains a product of M and L, each of which is infinitesimal to the first order, so it is second order and can be dropped. We can also replace the third term by making use of (18), so we have 

_{} 

Substituting back into equation (19), and noting that each ∂_{m}‘ is a scalar so it commutes with (I+M), we can factor (I+M) on the right side, to give 

_{} 

Therefore, if we define 

_{} 

we have 
_{} 

which is of the same form as the original Dirac equation. Of course, before we can assert that this equation is invariant under Lorentz transformations, we need to show that y’(x’) represents the same physical wave function with respect to the spacetime coordinates x’ as is represented by y(x) with respect to the spacetime coordinates x. This requires us to show that they have the same probability density at any given event. First we need to determine an explicit expression for the matrix M in terms of the infinitesimal Lorentz transformation L. Recall that we defined M implicitly by the equation (18) 

_{} 

The right hand side can be split into two parts 

_{} 

Making use of the antisymmetry of L, this can be written as 

_{} 

Noting that equations (3) can be summarized by the expression 

_{} 

we can substitute for h in the preceding equation to give 

_{} 

Therefore, equation (18) implies that 

_{} 

Again it must be kept in mind that the g^{m} symbols represent matrices, not vectors as their single suffixes might suggest, so M is a matrix, not a scalar. 

Now, to prove that the wave function in (22) represents the same physical situation as the original wave function, recall that in nonrelativistic quantum mechanics the probability density is considered to be invariant, given by multiplying the state vector by its conjugate transpose, but in a relativistic theory the probability density cannot be invariant under Lorentz transformations if probability is to be invariant, just as the charge density cannot be invariant if charge is invariant. Instead, the probability density as a function of space and time must transform like the time component of a 4vector (just as does charge density). To determine the entire 4vector representing both the probability density and the probability current, recall that, as discussed previously, the four components of y can be split into two interrelated bivectors y_{a} and y_{b}, and the interchange of these two represents another kind of “transposition” (along with transposing the overall vector and taking the complex conjugates of the components). Hence, although the probability density of an ordinary state vector is generally of the form 

_{} 

we might expect this to be just one component of a complete expression involving the transposition of the two bivectors in the leading factor, which gives 

_{} 

Notice that the matrix transposing the two bivectors is the same as g^{0}, so we are led to hypothesize that instead of considering just the expression _{} we should consider _{}. In effect, the leading factor is subjected to three transformations, consisting of the overall transposition, the complex conjugation of the elements, and the transposition of the two bivectors. The four components of our probability densitycurrent vector are then hypothesized to be 

_{} 

To investigate whether this gives a logically selfconsistent theory, recall that the transposed complex conjugate of our transformed state vector y’ in terms of y can be found from equation (21), noting that the transpose of a product is the product of the transposes in reverse order. Thus we have 

_{} 

Now, the matrix M is not antiHermitian, so we can’t simply write the second factor on the right side as (I – M). However, it’s easy to verify that each term of M is made antiHermitian by multiplying it on both sides by g^{0}. Consequently we have 

_{} 

Making use of this relation, and neglecting secondorder terms in the infinitesimal matrix M, we can evaluate the four scalar quantities given by 

_{} 

Substituting from (20) this becomes 

_{} 

where, in the last step, we’ve made use of the fact that an infinitesimal Lorentz transformation L is antisymmetric. Thus the four scalar quantities defined by the above equation transform as the components of a 4vector, with the components (relative to this basis) given by 

_{} 

The “time” component r_{0} of this 4vector reduces to _{}. This positivedefinite quantity represents the probability density, which transforms (as it should) under a Lorentz transformation in the same way as the time coordinate of a timelike interval. Thus Dirac achieved his objective of finding a relativistic and positivedefinite probability density for the electron. The “space” components r_{1}, r_{2}, r_{3} represent the probability current, i.e., the probability of the particle crossing a given plane (normal to a given direction) per unit area and per unit time. The invariant squared magnitude of the densitycurrent is 

_{} 

If we express each of the four components of the wave function in the form 

_{} 

then the squared magnitude of the densitycurrent (in this basis) can be written as 

_{} 

This is obviously realvalued, and it is positive definite, because the minimum value it can take is with the cosine equal to 1, in which case the quantity factors as the square of a real number. 

Another invariant quantity is given by 

_{} 

where, as always, we omit secondorder terms in the infinitesimal M. Also, since g_{5} anticommutes with each of the g^{m}, it follows that g_{5} commutes with M, and therefore we have still another invariant, given by 

_{} 

These last two invariants can be regarded as orthogonal components of the densitycurrent vector, since their magnitudes are given by 

_{} 

and hence the invariant magnitudes are related to the magnitude of the densitycurrent by 

_{} 

Regarding the above derivations, it’s worth noting that Dirac originally chose a basis in which the coefficient of ∂_{0} was the identity matrix I, and the coefficient of m, which he called g_{m}, was not equal to I. This is in contrast to our discussion above, where we’ve done just the opposite, i.e., we’ve chosen a basis in which the coefficient of m is the identity matrix, and g^{0} is not. In Dirac’s basis, since g^{0} = I, the matrix M itself is antiHermitian, which simplifies some of the results, but is less consistent with modern usage. 

As discussed previously, the four components of the wave function y can be split into two sets, y_{a} and y_{b}, each consisting of two components, and these two sets are redundant, so the particle can be represented by just two wave functions, corresponding to the two possible spin states for a given basis. However, in another sense, the fact that the full wave function has four components instead of just two is very significant, because the transformation from one basis to another can make use of these four degrees of freedom to produce an unexpected (in 1927) consequence. Recall that the matrices denoted by g_{i}, i = 0,1,2,3 together with the identity matrix generate a group of 16 signed “unit” matrices, and this same group can be generated by some other subsets. For example, the group of units (4) can be generated by the four matrices 

_{} 

Since the four g matrices satisfy all the required conditions, they represent an equally valid basis for Dirac’s equation (6), which can therefore be written in the form 

_{} 

Notice that g_{ 2} is the only one of the original g matrices that is imaginary, and it appears as a factor in each of the g’ matrices, so all of the g ’ matrices are imaginary. Now, if we apply complex conjugation to every quantity appearing in an equality, replacing each i with –i, the equality still holds. Noting that every term in (26) has a factor of i, it follows that the complex conjugate of y is also a solution of (26). For an electron, this complex conjugate solution represents the “positron”, i.e., the antiparticle of the electron. If the interaction with an electromagnetic field is included in the Dirac equation, the charge of the particle is negated for the conjugate solution, so the positron has positive electric charge, but the same mass and spin attributes as an electron. 

Incidentally, Dirac originally thought his equation applied to every particle of mass m, and hence that all massive particles must have “spin 1/2”. This is indeed the case for the particles that were known in 1928, namely electrons, protons, and neutrons, but other kinds of particles (including photons) are known to have spins different from hbar/2. Dirac’s explanation for this is interesting: 

The answer is to be found in a hidden assumption in our work. Our argument is valid only provided the position of the particle is an observable. If this assumption holds, the particle must have a spin angular momentum of half a quantum. For those particles that have a different spin the assumption must be false and any dynamical variables x_{1}, x_{2}, x_{3} that may be introduced to describe the position of the particle cannot be observables in accordance with our general theory. For such particles there is no true Schrödinger representation. One might be able to introduce a quasi wave function involving the dynamical variables x_{1}, x_{2}, x_{3}, but it would not have the correct physical interpretation of a wave function—that the square of its modulus gives the probability density. For such particles there is still a momentum representation, which is sufficient for practical purposes. 

Dirac’s theory of the electron was remarkably successful, especially in its prediction of the positron, which was discovered experimentally just two years after Dirac published his prediction. However, to account for the fact that matter doesn’t degenerate into negativeenergy states, Dirac found it necessary to propose a “sea” of antiparticles, and then invoke the Pauli exclusion principle, arguing that all the negative energy states were occupied. The positron was then conceived as a “hole” in the Dirac sea. In retrospect, this “hole” explanation seems unconvincing, because we now know that all particles, not just fermions for which the Pauli exclusion principle applies, are accompanied by antiparticles, so the “hole” interpretation doesn’t work. Weinberg asked Dirac about this in 1972, and Dirac replied that he didn’t regard massive bosons as “important”. It isn’t clear what he meant by this (perhaps he meant that such particles are not elementary?), although Weinberg notes that a few years later Dirac acknowledged that “for bosons we no longer have the picture of a vacuum with negative energy states filled up… the whole theory becomes more complicated”, presumably referring to the creation and annihilation operators of modern quantum field theory. The modern view seems to be that Dirac’s prediction of the positron was not entirely wellfounded, although it certainly does emerge rather unavoidably from consideration of the two roots of E^{2} – p^{2}= m^{2}. 

The most profound implication of Dirac’s equation was that any relativistic description of a particle necessarily involves not just the wave function of a single particle, but multiple wave functions representing the potential for other particles. The first quantization in physics gave a field representation for all the possible states of a given particle, by treating the observable properties (such as position and momentum) as operators on a wave function. The “second quantization”, suggested by Dirac’s equation, then consists of treating this quantum wave function itself as an operator, giving a field representation of all the possible quantum fields. We might says that second quantization considers “the field of all fields”. It’s remarkable that general relativity (the other fundamental theory of physics developed in the early 20th century) also involves a consideration of the field of all fields, albeit in a completely different sense. In both cases this leads to nonlinearities, and in both cases the theories are found to entail infinities – if we regard them as exact to all orders, rather than just lowenergy “effective” field theories. 
