Principles of special relativity
• Introduction.
Here we shall discuss very succintly the basic principles of special relativity. There are two reasons for doing this. First, when introducing optical devices, we saw that the electron-photon perturbation Hamiltonian is given by (see Lecture Notes, page 125, Eq. (253); page 281, Eq. (726)):
Hphot = ie¯h
mcA· ∇ . (A1)
This interaction term can be obtained by starting from the free-electron Hamiltonian p2/(2m) and replacing the electron momentum p with p− eA/c. (The factor of c in the denominators appears when using Gaussian units, which are more convenient in this context and we shall use them here). The reason behind this substitution relies on some basic principles of special relativity and it is worth understanding how this comes about.
The second reason stems from the recent popularity of graphene as a ‘promising’ material for nanolectronic applications. The band structure of graphene shows a unique feature: At the Fermi level the dispersion of the valence and conduction bands cross as straight lines. This E(k) relation resembles the dispersion of relativistic massless particles and it has triggered interest in exploiting what is known about relativistic massless spin-1/2 particles in order to understand the electronic properties of graphene. Such particles are described by a spin-1/2 relativistic wave equation (the Dirac equation for zero-mass spin-1/2 Fermions), so that it is interesting to see how Schr¨odinger equation can be generalized to statisfy the principle of relativistic invariance.
• Galilean invariance and Maxwell’s equations.
A soon as Maxwell’s equation were formulated, it became clear that there was a major difference with respect to Newton’s law. Let’s start with Newton’s second law,
F = ma or F = md 2x
and consider this same equation as could be written by somebody (called an ‘observer’) which is moving with respect to us with uniform velocity u. Let’s assume that we and the other observer use a reference frame with parallel x, y, and z axes, that the other observer moves along the x axis, so that u = (u,0,0), and that the origin of the two reference frames coincide at a fixed instant in time which we take t = 0. Then, calling
(x, y, z) our frame and (x, y, z) the other observer’s frame, an object located at a point (x, y, z) in our frame will be located at a point (x, y, z) in the observer’s frame, such that:
x = x − ut y = y
z = z
. (A3)
Therefore, expressing Newton’s second law in the ‘primed’ frame,
F = md 2x dt2 = m d2(x− ut) dt2 = m d2x dt2 − du dt = m d2x dt2 . (A4)
In other words, Newton’s law retains the same algebraic form in all frames which are moving with uniform velocity with respect to our frame. This principle can be generalized by saying that the laws of mechanics are valid in all ‘inertial frames’. An observer, by performing experiments, cannot tell whether he/she is moving with respect to other inertial frames. This is the principle of Galilean relativity, since it was proposed (noted, discovered, invented?... the choice of a term is a matter of deep philosophical discussions) by Galileo.
Consider now Maxwell’s equations. For simplicity, let’s just consider a wave equation: ∇2 − 1 c2 ∂2 ∂t2 ψ = 0 . (A5)
so that ∂ψ/∂x = ∂ψ/∂x+ (1/u)∂ψ/∂t, we find: ∇2 − 1 c2 ∂2 ∂t2 − 2u c2 ∂2 ∂x∂t − u2 c2 ∂2 ∂x2 ψ = 0 . (A6)
What a mess! The form of the equation has been completely altered by the transformation! In hindsight, we already knew it had to be so: After all, magnetic fields caused by moving charges must disappear when we use a frame in which the charges are at rest. Therefore, the E and B fields do not transform correctly under the Galilean transformation Eq. (A1). Moreover, the Lorentz force depends explicitly on the velocity of the particle, so that the form of the equation will differ in a different inertial frame. Historically, this is also related to the difficulty of understanding electromagnetic waves: Sound waves are oscillations of the medium in which they propagate. But electromagnetic waves are oscillations of what?
In order to fix the situation we have three alternatives we can choose from:
1. Maxwell’s equations are wrong. The correct equations, yet to be discovered, are invariant under Galilean transformations.
2. Galilean invariance is valid for mechanics, not for electromagnetism. This is the historical solution before Einstein: The ‘ether’ determines the existence of the ‘absolute frame’ in which the ether is at rest and Mawxwell’s equations hold.
3. Galilean invariance is wrong. There is a more general invariance – yet to be dsicovered – which preserves the form of Maxwell’s equation. Classical mechanics is incorrect and must be reformulated so that it is invariant under this new transformation.
Having to chose between thrashing Maxwell (option 1) or Newton (option 3), physicists chose the easier option 2. Einstein, instead, decided to follow the third option, guided by two postulates:
1. Postulate of relativity: All physical laws must ‘lookthe same’ in all frames moving with uniform velocity with repect to each other.
2. Postulate of the constancy of the speed of light: The speed of light is the same (numerically the same!) independent of the velocity of the observer or of its source. This stems logically from the Michelson-Morley experiment of 1887, but the result could have been explained ‘saving’ ether and using the Lorentz-Fitzgerald contractions.
Armed with these postulates, Einstein set to build a new set of transformations between inertial frames. Maxwell’s equation are now invariant under this new set of transformations, but Newtonian mechanics has to be modified: If two frames move at a relative speed much smaller than the speed of light, the ‘new’ transformations approach the usual Galilean transfomation and Newton is approximately correct. But for relative velocities approaching the speed of light, the laws of mechanics deviate enormously from Newton’s laws.
It is impossible to pay justice to special relativity in such a short time. We shall only discuss those few concepts which are required to answer our original question of why we perform the substitution p → p− (e/c)A.
• Lorentz transformations.
Consider the same two frames (‘primed’ K and ’unprimed’ K frames) considered above. Assume now that at
t = 0 (the time at which the origins of the two frames coincided... at least when looking at our clock...!) a ray of light was emitted from the origin. Since light travels at the same speed in both frames, we must have for the
wavefronts of the emitted light:
ct2 − x2 − y2 − z2 = 0 and ct2 − x2 − y2 − z2 = 0 , (A7) so that, assuming that space-time is isotropic and homogeneous,
ct2 − x2 − y2 − z2 = λ(u)[ct2 − x2 − y2 − z2] . (A8) The function λ(u) is a possible velocity-dependent change of scale between the two frames. However, since going from K to K must involve the same transformation as going from K to K with a sign-flip for u, we must have λ = 1. Using the notation x0 = ct, x1 = x, x2 = y, and x3 = z, Eq. (A8) is satisfied if:
x0 = γ(x0 − βx1) x1 = γ(x1 − βx0) x2 = x2 x3 = x3 , (A9) where β = u c and γ = (1 − β 2)−1/2 . (A10)
The transformations Eq. (A9) are called ‘Lorentz transformations’. Obviously the inverse transformations read:
x0 = γ(x0 + βx1) x1 = γ(x1 + βx0) x2 = x2
x3 = x3
, (A11)
Note that, unlike the Galilean transformations, now time and space transform together: Simultaneous events in one frame will not be simultaneous in another frame. Indeed consider two events (ct1,x1) and (ct2,x2).
Thanks to Eq. (A8), their ’distance’
s12 = c2(t1 − t2)2 − |x1 − x2|2 (A12) is an ‘invariant’ (that is, it’s the same in all inertial frames). If s12 > 0 the separation between the events is said to be ‘time-like’: It is always possible to find a transformation (actually with β = |x1−x2|/(c|t1−t2|)) such that in the transformed frame the two events are at the same spatial location, separated only by time. If
s12 < 0 the separation between the events is said to be ‘space-like’: It is always possible to find a Lorentz transformation such that in the transformed frame the two events are simultaneous, separated only spatially. If, finally, s12 = 0, the separation is said to be ‘light-like’: One event lies on the ‘light-cone’ of the other.
• Proper time and time dilatation.
Since ‘time’ has become an observer-dependent quantity, when dealing with moving particles it is convenient to consider the time in the frame in which the particle is at rest. If v(t) is the velocity of the moving particle in ‘our’ frame, let’s consider the invariant:
ds2 = dxµdxµ = c2dt2−dx2−dy2−dz2 = c2dt2−v2dt2 = c2(1−v2/c2)dt2 = c2dt2/γ2 . (A13) Since this quantity is invariant (that is, it is numerically the same in all inertial frames), in the frame in which the particle is at rest it will be ds2 = c2dτ2, where τ is the time in that frame. This time is called ‘proper time’. If the particle travels over a time interval τ2 − τ1 in its proper time, as seen by us the time interval will stretch to the interval t2 − t1 obtained by integrating Eq. (A13) along the particle trajectory:
t2 − t1 = τ2 τ1 dτ γ(τ) = τ2 τ1 dτ 1 − v(τ)2/c2 . (A14)
Since γ > 1, the time interval we observe, t2 − t1, is longer than the proper time interval τ2 − τ1. This has been experimentally verified: The µ-mesons (actually, leptons) produced by cosmic-ray hits in the upper atmosphere, often reach the ground. Since the lifetime of the µ-meson is about 2.2 µs, even at the speed of
light the particle could not travel more than about 660 m before decaying. Yet, they can easily be detected after having traveled distances more than two orders of magnitude longer (the thickness of the atmosphere, of the order of 105 m). This is because their lifetime, as observed by us, is stretched enormously, as these particles travel at speeds approaching the speed of light.
• Lorentz contraction.
Consider a rod of length L at rest in the K frame. Let the ends of the rod be at x = x1 and x = x2, so that L = x2 − x1. What is the length of the rod in our K frame? By the Lorentz transformations, Eq. (A9) (which we must use since when we measure the length of the rod we measure its ends at the same time in our frame), we have: L = x2 − x1 = 1 γ(x 2 − x1) = L γ < L . (A15)
The rod in our frame appears shorter than in its rest frame. This is the Lorentz-FitzGerald contraction which was postulated (without proof or arguments behind) in order to explain the Michelson-Morley experiment.
• Addition of velocities.
Let’s consider the Lorentz transformation
dx0 = γ(dx0 + βdx1) dx1 = γ(dx1 + βdx0) dx2 = dx2
dx3 = dx3
. (A16)
In our K frame, the velocity v = cdx1/dx0 of a particle moving with velocity v = cdx1/dx0 in the K
frame will be:
v = c dx1 dx0 = c dx1 + βdx0 dx0 + βdx1 = c dx0(dx1/dx0 + β) dx0(1 +βdx1/dx0) = v +u 1 +vu/c2 . (A17)
Note how the particle velocity v and the frame velocity u add, as seen by us: In the limit of small velocities,
by us cannot exceed c. This is consistent with the second postulate. Note that the 4-vector
Uµ = (γc, γu) (A18)
transforms like the coordinate vector xµ.
Note the we shall use the contravariant (e.g., xµ) and covariant (e.g., xµ = gµν xν) notation. Using the metric (+,−,−,−) for µ = 0,3 we havexµ = (x0,x) but xµ = (x0,−x)). Thus xµxµ = x20 −x·x.
• 4-momentum.
In classical mechanics the momentum and energy of a particle are:
E = E(0) + 12mu2
p = mu . (A19)
The term E(0) is a constant which refers to the rest-energy of the particle. It is usually ignored in non-relativistic discussions.
In order to generalize these concepts, we can generally start by finding arbitrary functions of the velocity, E(u)
and M(u) such that:
E = E(u)
p = M(u)u , (A20)
with the constrains (dictacted by the fact that we want to recover Eq. (A19) in the limit u → 0): ∂E ∂u2(0) = m 2 M(0) = m . (A21)
The general expressions for the functions E and M can be obtained by analyzing the elastic collision of two identical particles and require momentum and energy conservation in two inertial frames K and K. We’ll skip the derivation (see Jackson, Classical Electrodynamics, Sec. 11.5) and simply state the result: The general form
for these two functions consistent with the two postulates, and energy and momentum conservation is:
E(u) = γmc2
M(u) = γm . (A22)
Note that in the limit u → 0, E = γmc2 → mc2 +mu2/2, which is the classical result with a rest energy
mc2.
From this we can define an energy-momentum 4-vector, pµ:
pµ = (γmc, γmu) = (E/c, γmu) = mUµ , (A23) having used Eq. (A18) in the last step.
The invariant length of the energy-momentum 4-vector is
pµpµ = p20 − |p|2 = E2/c2 − γ2m2u2 = γ2(m2c2 − m2u2) = m2c2 . (A24) Finally, from this expression we can write the energy E of a particle as:
E =
c2p2 + m2c4 . (A25)
• Lorentz force.
Our goal now is to re-express the Lorentz force (recall that we are using Gassian units here):
dp
dt = e(E +
u
c × B) (A26)
in a manifetsly covariant form and find a Hamiltonian from which Eq. (A26) may be derived.
power-balance dE/dτ = (e/c)u · E, so that we can employ the 4-vector pµ: dp dτ = γ dp dt = eγE + eγuc × B = ec(U0 + U× B) dp0 dτ = γ dp0 dt = γ e cu · E = ecU· E . (A27)
Defining the electromagnetic field tensor:
Fµν = 0 −Ex −Ey −Ez Ex 0 −Bz By Ey Bz 0 −Bx Ez −By Bx 0 , (A28)
we can write Eq. (A27) in the manifestly covariant form: dpµ dτ = e cF µν Uν (A29)
We must now find a Hamiltonian function whose dynamic equations (the Hamilton equations of motion) yield the Lorentz-force equation. To do this correctly it would be necessary to develop a bit of Lagrangian theory. So, here we follow a ‘pragmatic approach’: Let’s define the Hamiltonian:
H =
c2[cp − (e/c)A]2 +m2c4 + eΦ , (A30) where Φ is the scalar potential. Considering that the particle velocity in terms of p and A is (as it follows from Lagrangian theory):
u = cp − eA
(p− (e/cA)2 +m2c4 ,
(A31)
with some algebra one can verify that the Hamilton equations of motion (i = 1,2,3):
∂H ∂xi =
dpi
is equivalent to the Lorentz-force equation (the first of the two Eqns. (A27)). Comparing this expression with Eq. (A25) we see that the interaction between a charged particle and the electromagnetic field has been accounted for by replacing the particle momentum p with p − (e/c)A, which is what we wanted to show.
• Relativistic wave equations.
One possible way to ‘derive’ (wrong term, but let’s ignore deep philosophical discussions!) Schr¨odinger’s equation is to write the energy-momentum dispersion for a free particle of mass m,
E = p
2
2m, (A33)
set E → i¯h∂/∂t, p → −i¯h∇, and view Eq. (A33) as operators acting on a vector ψ in some Hilbert space
H. Thus: E = p 2 2m → − ¯ h2 2m∇ 2ψ(r, t) = ih¯ ∂ ∂tψ(r, t) , (A34)
which is the wave equation for non-relativistic particles. Searching for a relativistic wave equation, we should start by replacing the non-relativistic energy-momentum relation given by Eq. (A33) with its relativistic counterpart given by Eq. (A25). We immediately encounter a big problem: Eq. (A25) contains a square root. As innocent as this may seem, viewing this equation in terms of operators would yield a mathematically very nasty outcome: Taking the square root of a linear differential operator yields a nonlocal, unbound (so, not continuous in the operator sense) operator. As a first, alternative approach, let’s avoid this ‘square-root’ issue by considering the square of Eq. (A25):
E2 = c2p2 + m2c4 → −h¯2 ∂ 2 ∂t2φ(r, t) = −c 2h¯2∇2φ+ m2c4φ(r, t) , (A35) or (c2¯h2✷2 +m2c4) φ(r, t) = 0 , (A36) where the d’Alembert operator is defined as ✷2 = (1/c2)∂2/∂t2 − ∇2 and φ represents the relativistic state vector. This a called the ‘Klein-Gordon equation’. In relativistic quantum mechanics it is customary to use units
in which c = ¯h = 1, so the Klein-Gordon equation, Eq. (A36) is written simply as:
(✷2 + m2) φ(x) = [∂µ∂µ +m2]φ(x) = 0 , (A37) where x is the 4-vector (r, t), inner products have the metric (−,−,−,+), and ✷2 takes the form
∂2/∂t2−∇2 =∂µ∂µ, the latter form having been writen using the notation ∂µ = ∂/∂xµ and also the ‘usual’ relativistic convention according to which the index µ runs over the 4 dimensions and the sum is implicitly implied over repeated indices. The Klein-Gordon equation is relativistically covariant, as desired, but its interpretation as a wave equation for a single particle (so, not as a second-quantization field equation) is problematic for two major reasons. First, plane-wave solutions of Eq. (A37) are of the form φ(r, t) = exp[−i(k · r − Et)] with both positive and negative energies, E = ±k2 +m2. One may attempt to interpret the negative-energy waves as anti-particles of the positive-energy solutions, but this interpretation encounters problems as soon as interactions are introduced. For example, adding the interaction with the electromagnetic field, ∂µ → ∂µ − ieAµ, we get an equation which admits transitions between negative- and positive-energy solutions. This problem affects all relativistic first-quantization formulations and can be bypassed only in second quantization. More serious is a second problem specific to the Kein-Gordon equation: The charge-current 4-vector
jµ(x) = i[∂µφ∗(x)φ(x)− φ∗(x)∂µφ(x)] , (A38) obeys the conservation law
∂µjµ(x) = 0 , (A39)
as it follows directly from Eq. (A37), but attempting to identifyρ(x) = −ij0(x) = −i[ ˙φ∗(x)−φ∗(x) ˙φ(x)]as a probability density yields negative values. For example, for an eigenstate of energy E, ρ(x) = 2Eφ∗(x)φ(x), which is negative for negative E. One may simply ignore negative-energy solutions, but, yet again, interactions can induce transitions to those and the interpretation becomes really problematic.
A second approach is quite different. We saw that squaring Eq. (A25) results in negative-energy solutions and negative probability densities. On the other hand, retaining the square root is not an option, since it results in
nonlocal, unbound operators. But looking at the expression
E2 = p2 + m2 (A40)
one is reminded of the ’factorization’ of the form A2 = α2 + β2 as AA∗ where A = α + iβ. This look s like a clean way to take some sort of ‘square root’. The quantity i (the imaginary unit) does the trickwhen we consider the ‘two-dimensional’ form AA∗ = α2 + β2, but we need other mathematical objects in the four-dimensional case we are considering here. We require E2 = m2 + p2 to be expressed in a form of the type (βm+α·p)(βm+α·p), where α and β are mathematical entities yet to be determined. The relation we just wrote can be satisfied only if βαi + αiβ = 0 and αiαj + αjαi = 0 for i = j (so that all the unwanted cross-terms vanish) and α2i = β2 = 1. This implies that the quantities β and α must be matrices. It turns out that these quantities (with algebraic properites related to the so-called ‘quaternions’) have to be
4 × 4 matrices and can be represented in terms of the Pauli matrices as follows:
γi = 0 σi −σi 0 , γ0 = I 0 0 −I (A41)
where i = 1,3, I is the 2 × 2 identity matrix and σi are the Pauli matrices:
σ1 = 0 1 1 0 σ2 = 0 −i i 0 σ3 = 1 0 0 −1 . (A42)
In the literature one may find several different conventions regarding the definitions of the Dirac matrices, the difference originating from the different convention used for the metric: A 4-vector x can be defined as (x0,x) with (+,−,−,−) metric, as done here. Alternatively, x can be defined as (x, x4) = (x, ix0) with purely imaginaryx4 and Eucledian metric (+,+,+,+).
Note that these matrices obey the anti-commutation laws:
[γµ, γν]+ = 2 gµν , (A43) where gµν = gµν is the diagonal metric tensor (+,−,−,−). To see that these matrices accomplish what we
set out to do, let’s write the wave equation
(−iγµ∂µ + m)ψ(x) = 0 (or (−iγµ∂µ +
mc ¯
h )ψ(x) = 0 with units restored, x0 = ct) , (A44)
where ψ is now a 4-vector ψα, where α = 0,3. This index can be viewed as a spin-index × a sign-index labeling the sign of the energy (positive for electrons, negative for positrons, in Dirac’s original interpretation) and spin for the spin-1/2 case. If we multiply the Dirac equation, Eq. (A44), by (−iγλ∂λ − m), we have
(−iγλ∂λ − m)(−iγµ∂µ +m)ψ(x) = (−γλγµ∂λ∂µ − m2)ψ(x) = 0 . (A45) Now, expressing γλγµ∂λ∂µ as 1/2 the sum this quantity and itself, swapping the name of the dummy indices λ and µ in the second term, noticing that ∂λ∂µ = ∂µ∂λ and using the anticommutation properties, Eq. (A43), we can write this as:
1 2(γ λ γµ∂λ∂µ + γµγλ∂λ∂µ) + m2 ψ(x) = 1 2[γ λ , γµ]+∂λ∂µ + m2 ψ(x) = (∂µ∂µ+m2)ψ(x) = 0, (A46)
which is the Klein-Gordon equation. So, we have effectively ‘squared’ the dispersion and Dirac’s equation can be viewed as a sort of square root of the Klein-Gordon equation.
The plane-wave solutions of Dirac’s equation can be obtained as follows. Let’s express the desired solution in the form
ψ(x) = w(k, E) ei(k·r−iEt) , (A47) where w(k, E) is a four-component spinor which satisfies the equation
(γ0E − γ · k + m)w(k, E) = 0 . (A48) Let us now write w(k, E) in terms of two two-component objects w+ and w−:
w(k, E) = w+(k, E) w−(k, E) . (A49)
The quantities w± must satisfy the two coupled two-component equations: σ · kw− = (E − m)w+
σ · kw+ = (E + m)w− . (A50)
For m = 0 it is convenient to employ the rest frame of the particle. We have either
E = m w+ = 0 w− = 0 , (A51) or E = −m w+ = 0 w− = 0 . (A52)
The nonzero component w± can be chosen arbitrarily, so it is convenient to consider them eigenvalues of σz. Thus we have the four solutions:
1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 . (A53)
The first two solutions correspond to particles of energy E = m with spin up and down, respectively, while the latter solutions correspond to particles of energy E = −m. The existence of negative-energy solutions constitute the problem which, as mentioned above, affects all first-quantization relativistic formulations. Dirac suggested that these solutions should be intepreted as anti-particles (i.e., positrons in the case of electrons). Note how spin, introduced as an ad hoc new degree of freedom in conventional Quantum Mechanics, now emerges naturally from the formulation of a relativistic wave equation. This constitutes a major success of Dirac’s theory.
For arbitrary k the solutions are: uk,σ = wσ(k, Ek) = Ek +m 2m 1/2 ξ σ σ·k Ek+mξσ , (A54) (for σ = 1,2) and vk,σ = wσ(−k,−Ek) = Ek + m 2m 1/2 σ·k Ek+mξσ ξσ . (A55) In these expressions ξ1 = 1 0 and ξ2 = 0 1
. Considering now the case m = 0 (as for the early theories of neutrinos or looking at the analogy with the dispersion in graphene), we can chose the 4 independent 4-component spinors (in place of Eqns. (A54) and (A55)):
uk,σ = √1 2 ζσ σ·k k ζσ , (A56) (for σ = 1,2) and vk,σ = √1 2 σ·k k ζσ ζσ , (A57)
corresponding to the negative and positive-energy dispersion E = ±k on the ‘light cone’. The 2-component spinors ζσ are now chosen not as eigenstates of σz, but as eigenstates of the ‘helicity’ σ · k/k:
σ · n ζ1 = ζ1
σ · n ζ2 = −ζ2 , (A58)
where n = k/k is the unit vector along the direction of motion. This is rendered necessary by a profound difference between massless and massive particles: For massive particles one is free to select aribitrarily the
rest-frame form of w± in Eqns. (A51)-(A55), since one can chose to define the polarization in the rest frame along any arbitrary direction. On the contrary, for massless particles traveling at the speed of light there is no such a thing as a ‘rest frame’ and the polarization must be measured along the direction of motion, k/k.