Principles of special relativity

(1)

Principles of special relativity

• Introduction.

Here we shall discuss very succintly the basic principles of special relativity. There are two reasons for doing this. First, when introducing optical devices, we saw that the electron-photon perturbation Hamiltonian is given by (see Lecture Notes, page 125, Eq. (253); page 281, Eq. (726)):

H_phot = ie¯h

mcA· ∇ . (A1)

This interaction term can be obtained by starting from the free-electron Hamiltonian p2/(2m) and replacing the electron momentum p with p− eA/c. (The factor of c in the denominators appears when using Gaussian units, which are more convenient in this context and we shall use them here). The reason behind this substitution relies on some basic principles of special relativity and it is worth understanding how this comes about.

The second reason stems from the recent popularity of graphene as a ‘promising’ material for nanolectronic applications. The band structure of graphene shows a unique feature: At the Fermi level the dispersion of the valence and conduction bands cross as straight lines. This E(k) relation resembles the dispersion of relativistic massless particles and it has triggered interest in exploiting what is known about relativistic massless spin-1/2 particles in order to understand the electronic properties of graphene. Such particles are described by a spin-1/2 relativistic wave equation (the Dirac equation for zero-mass spin-1/2 Fermions), so that it is interesting to see how Schr¨odinger equation can be generalized to statisfy the principle of relativistic invariance.

• Galilean invariance and Maxwell’s equations.

A soon as Maxwell’s equation were formulated, it became clear that there was a major diﬀerence with respect to Newton’s law. Let’s start with Newton’s second law,

F = ma or F = md 2_x

(2)

and consider this same equation as could be written by somebody (called an ‘observer’) which is moving with respect to us with uniform velocity u. Let’s assume that we and the other observer use a reference frame with parallel x, y, and z axes, that the other observer moves along the x axis, so that u = (u,0,0), and that the origin of the two reference frames coincide at a ﬁxed instant in time which we take t = 0. Then, calling

(x, y, z) our frame and (x, y, z) the other observer’s frame, an object located at a point (x, y, z) in our frame will be located at a point (x, y, z) in the observer’s frame, such that:

x = x − ut y = y

z = z

. (A3)

Therefore, expressing Newton’s second law in the ‘primed’ frame,

F = md 2_x dt2 = m d2(x− ut) dt2 = m d2x dt2 − du dt = m d2x dt2 . (A4)

In other words, Newton’s law retains the same algebraic form in all frames which are moving with uniform velocity with respect to our frame. This principle can be generalized by saying that the laws of mechanics are valid in all ‘inertial frames’. An observer, by performing experiments, cannot tell whether he/she is moving with respect to other inertial frames. This is the principle of Galilean relativity, since it was proposed (noted, discovered, invented?... the choice of a term is a matter of deep philosophical discussions) by Galileo.

Consider now Maxwell’s equations. For simplicity, let’s just consider a wave equation: ∇2 − 1 c2 ∂2 ∂t2 ψ = 0 . (A5)

(3)

so that ∂ψ/∂x = ∂ψ/∂x+ (1/u)∂ψ/∂t, we ﬁnd: ∇2 − 1 c2 ∂2 ∂t2 − 2u c2 ∂2 ∂x∂t − u2 c2 ∂2 ∂x2 ψ = 0 . (A6)

What a mess! The form of the equation has been completely altered by the transformation! In hindsight, we already knew it had to be so: After all, magnetic fields caused by moving charges must disappear when we use a frame in which the charges are at rest. Therefore, the E and B fields do not transform correctly under the Galilean transformation Eq. (A1). Moreover, the Lorentz force depends explicitly on the velocity of the particle, so that the form of the equation will differ in a different inertial frame. Historically, this is also related to the difficulty of understanding electromagnetic waves: Sound waves are oscillations of the medium in which they propagate. But electromagnetic waves are oscillations of what?

(4)

In order to ﬁx the situation we have three alternatives we can choose from:

1. Maxwell’s equations are wrong. The correct equations, yet to be discovered, are invariant under Galilean transformations.

2. Galilean invariance is valid for mechanics, not for electromagnetism. This is the historical solution before Einstein: The ‘ether’ determines the existence of the ‘absolute frame’ in which the ether is at rest and Mawxwell’s equations hold.

3. Galilean invariance is wrong. There is a more general invariance – yet to be dsicovered – which preserves the form of Maxwell’s equation. Classical mechanics is incorrect and must be reformulated so that it is invariant under this new transformation.

Having to chose between thrashing Maxwell (option 1) or Newton (option 3), physicists chose the easier option 2. Einstein, instead, decided to follow the third option, guided by two postulates:

1. Postulate of relativity: All physical laws must ‘lookthe same’ in all frames moving with uniform velocity with repect to each other.

2. Postulate of the constancy of the speed of light: The speed of light is the same (numerically the same!) independent of the velocity of the observer or of its source. This stems logically from the Michelson-Morley experiment of 1887, but the result could have been explained ‘saving’ ether and using the Lorentz-Fitzgerald contractions.

Armed with these postulates, Einstein set to build a new set of transformations between inertial frames. Maxwell’s equation are now invariant under this new set of transformations, but Newtonian mechanics has to be modiﬁed: If two frames move at a relative speed much smaller than the speed of light, the ‘new’ transformations approach the usual Galilean transfomation and Newton is approximately correct. But for relative velocities approaching the speed of light, the laws of mechanics deviate enormously from Newton’s laws.

It is impossible to pay justice to special relativity in such a short time. We shall only discuss those few concepts which are required to answer our original question of why we perform the substitution p → p− (e/c)A.

• Lorentz transformations.

Consider the same two frames (‘primed’ K and ’unprimed’ K frames) considered above. Assume now that at

t = 0 (the time at which the origins of the two frames coincided... at least when looking at our clock...!) a ray of light was emitted from the origin. Since light travels at the same speed in both frames, we must have for the

(5)

wavefronts of the emitted light:

ct2 − x2 − y2 − z2 = 0 and ct2 − x2 − y2 − z2 = 0 , (A7) so that, assuming that space-time is isotropic and homogeneous,

ct2 − x2 − y2 − z2 = λ(u)[ct2 − x2 − y2 − z2] . (A8) The function λ(u) is a possible velocity-dependent change of scale between the two frames. However, since going from K to K must involve the same transformation as going from K to K with a sign-ﬂip for u, we must have λ = 1. Using the notation x₀ = ct, x₁ = x, x₂ = y, and x₃ = z, Eq. (A8) is satisﬁed if:

x₀ = γ(x₀ − βx₁) x₁ = γ(x₁ − βx₀) x₂ = x₂ x₃ = x₃ , (A9) where β = u c and γ = (1 − β 2₎−1/2 . (A10)

The transformations Eq. (A9) are called ‘Lorentz transformations’. Obviously the inverse transformations read:

x₀ = γ(x₀ + βx₁) x₁ = γ(x₁ + βx₀) x₂ = x₂

x₃ = x₃

, (A11)

Note that, unlike the Galilean transformations, now time and space transform together: Simultaneous events in one frame will not be simultaneous in another frame. Indeed consider two events (ct₁,x₁) and (ct₂,x₂).

(6)

Thanks to Eq. (A8), their ’distance’

s₁₂ = c2(t₁ − t₂)2 − |x₁ − x₂|2 (A12) is an ‘invariant’ (that is, it’s the same in all inertial frames). If s₁₂ > 0 the separation between the events is said to be ‘time-like’: It is always possible to ﬁnd a transformation (actually with β = |x₁−x₂|/(c|t₁−t₂|)) such that in the transformed frame the two events are at the same spatial location, separated only by time. If

s₁₂ < 0 the separation between the events is said to be ‘space-like’: It is always possible to ﬁnd a Lorentz transformation such that in the transformed frame the two events are simultaneous, separated only spatially. If, ﬁnally, s₁₂ = 0, the separation is said to be ‘light-like’: One event lies on the ‘light-cone’ of the other.

• Proper time and time dilatation.

Since ‘time’ has become an observer-dependent quantity, when dealing with moving particles it is convenient to consider the time in the frame in which the particle is at rest. If v(t) is the velocity of the moving particle in ‘our’ frame, let’s consider the invariant:

ds2 = dxµdxµ = c2dt2−dx2−dy2−dz2 = c2dt2−v2dt2 = c2(1−v2/c2)dt2 = c2dt2/γ2 . (A13) Since this quantity is invariant (that is, it is numerically the same in all inertial frames), in the frame in which the particle is at rest it will be ds2 = c2dτ2, where τ is the time in that frame. This time is called ‘proper time’. If the particle travels over a time interval τ₂ − τ₁ in its proper time, as seen by us the time interval will stretch to the interval t₂ − t₁ obtained by integrating Eq. (A13) along the particle trajectory:

t₂ − t₁ = _τ₂ τ₁ dτ γ(τ) = _τ₂ τ₁ dτ 1 − v(τ)2/c2 . (A14)

Since γ > 1, the time interval we observe, t₂ − t₁, is longer than the proper time interval τ₂ − τ₁. This has been experimentally veriﬁed: The µ-mesons (actually, leptons) produced by cosmic-ray hits in the upper atmosphere, often reach the ground. Since the lifetime of the µ-meson is about 2.2 µs, even at the speed of

(7)

light the particle could not travel more than about 660 m before decaying. Yet, they can easily be detected after having traveled distances more than two orders of magnitude longer (the thickness of the atmosphere, of the order of 105 m). This is because their lifetime, as observed by us, is stretched enormously, as these particles travel at speeds approaching the speed of light.

• Lorentz contraction.

Consider a rod of length L at rest in the K frame. Let the ends of the rod be at x = x₁ and x = x₂, so that L = x₂ − x₁. What is the length of the rod in our K frame? By the Lorentz transformations, Eq. (A9) (which we must use since when we measure the length of the rod we measure its ends at the same time in our frame), we have: L = x₂ − x₁ = 1 γ(x 2 − x1) = L γ < L _. _(A15)

The rod in our frame appears shorter than in its rest frame. This is the Lorentz-FitzGerald contraction which was postulated (without proof or arguments behind) in order to explain the Michelson-Morley experiment.

• Addition of velocities.

Let’s consider the Lorentz transformation

dx₀ = γ(dx₀ + βdx₁) dx₁ = γ(dx₁ + βdx₀) dx₂ = dx₂

dx₃ = dx₃

. (A16)

In our K frame, the velocity v = cdx₁/dx₀ of a particle moving with velocity v = cdx₁/dx₀ in the K

frame will be:

v = c dx1 dx₀ = c dx₁ + βdx₀ dx₀ + βdx₁ = c dx₀(dx₁/dx₀ + β) dx₀(1 +βdx₁/dx₀) = v +u 1 +vu/c2 . (A17)

Note how the particle velocity v and the frame velocity u add, as seen by us: In the limit of small velocities,

(8)

by us cannot exceed c. This is consistent with the second postulate. Note that the 4-vector

Uµ = (γc, γu) (A18)

transforms like the coordinate vector xµ.

Note the we shall use the contravariant (e.g., _xµ) and covariant (e.g., xµ = _{gµν xν}) notation. Using the metric (+,−,−,−) for µ = 0,3 we have_xµ = (x₀,x) but xµ = (x₀,−x)). Thus _xµxµ = x2₀ −x·x.

• 4-momentum.

In classical mechanics the momentum and energy of a particle are:

E = E(0) + 1₂mu2

p = mu . (A19)

The term E(0) is a constant which refers to the rest-energy of the particle. It is usually ignored in non-relativistic discussions.

In order to generalize these concepts, we can generally start by ﬁnding arbitrary functions of the velocity, E(u)

and M(u) such that:

E = E(u)

p = M(u)u , (A20)

with the constrains (dictacted by the fact that we want to recover Eq. (A19) in the limit u → 0): ∂E ∂u2(0) = m 2 M(0) = m . (A21)

The general expressions for the functions E and M can be obtained by analyzing the elastic collision of two identical particles and require momentum and energy conservation in two inertial frames K and K. We’ll skip the derivation (see Jackson, Classical Electrodynamics, Sec. 11.5) and simply state the result: The general form

(9)

for these two functions consistent with the two postulates, and energy and momentum conservation is:

E(u) = γmc2

M(u) = γm . (A22)

Note that in the limit u → 0, E = γmc2 → mc2 +mu2/2, which is the classical result with a rest energy

mc2.

From this we can deﬁne an energy-momentum 4-vector, pµ:

pµ = (γmc, γmu) = (E/c, γmu) = mUµ , (A23) having used Eq. (A18) in the last step.

The invariant length of the energy-momentum 4-vector is

pµpµ = p2₀ − |p|2 = E2/c2 − γ2m2u2 = γ2(m2c2 − m2u2) = m2c2 . (A24) Finally, from this expression we can write the energy E of a particle as:

E =

c2p2 + m2c4 . (A25)

• Lorentz force.

Our goal now is to re-express the Lorentz force (recall that we are using Gassian units here):

dp

dt = e(E +

u

c × B) (A26)

in a manifetsly covariant form and ﬁnd a Hamiltonian from which Eq. (A26) may be derived.

(10)

power-balance dE/dτ = (e/c)u · E, so that we can employ the 4-vector pµ: dp dτ = γ dp dt = eγE + eγuc × B = ec(U0 + U× B) dp₀ dτ = γ dp₀ dt = γ e cu · E = ecU· E . (A27)

Deﬁning the electromagnetic ﬁeld tensor:

Fµν =     0 −Ex −Ey −Ez Ex 0 −Bz By Ey Bz 0 −Bx Ez −By Bx 0     , (A28)

we can write Eq. (A27) in the manifestly covariant form: dpµ dτ = e cF µν Uν (A29)

We must now ﬁnd a Hamiltonian function whose dynamic equations (the Hamilton equations of motion) yield the Lorentz-force equation. To do this correctly it would be necessary to develop a bit of Lagrangian theory. So, here we follow a ‘pragmatic approach’: Let’s deﬁne the Hamiltonian:

H =

c2[cp − (e/c)A]2 +m2c4 + eΦ , (A30) where Φ is the scalar potential. Considering that the particle velocity in terms of p and A is (as it follows from Lagrangian theory):

u = cp − eA

(p− (e/cA)2 +m2c4 ,

(A31)

with some algebra one can verify that the Hamilton equations of motion (i = 1,2,3):

∂H ∂x_i =

dp_i

(11)

is equivalent to the Lorentz-force equation (the ﬁrst of the two Eqns. (A27)). Comparing this expression with Eq. (A25) we see that the interaction between a charged particle and the electromagnetic ﬁeld has been accounted for by replacing the particle momentum p with p − (e/c)A, which is what we wanted to show.

• Relativistic wave equations.

One possible way to ‘derive’ (wrong term, but let’s ignore deep philosophical discussions!) Schr¨odinger’s equation is to write the energy-momentum dispersion for a free particle of mass m,

E = p

2

2m, (A33)

set E → i¯h∂/∂t, p → −i¯h∇, and view Eq. (A33) as operators acting on a vector ψ in some Hilbert space

H. Thus: E = p 2 2m → − ¯ h2 2m∇ 2_ψ₍_r_{, t}_{) =} _i_h_¯ ∂ ∂tψ(r, t) , (A34)

which is the wave equation for non-relativistic particles. Searching for a relativistic wave equation, we should start by replacing the non-relativistic energy-momentum relation given by Eq. (A33) with its relativistic counterpart given by Eq. (A25). We immediately encounter a big problem: Eq. (A25) contains a square root. As innocent as this may seem, viewing this equation in terms of operators would yield a mathematically very nasty outcome: Taking the square root of a linear diﬀerential operator yields a nonlocal, unbound (so, not continuous in the operator sense) operator. As a ﬁrst, alternative approach, let’s avoid this ‘square-root’ issue by considering the square of Eq. (A25):

E2 = c2p2 + m2c4 → −h¯2 ∂ 2 ∂t2φ(r, t) = −c 2_h_¯2_∇2_φ₊ _m2_c4_φ₍_r_{, t}₎ _, _(A35) or (c2¯h2✷2 +m2c4) φ(r, t) = 0 , (A36) where the d’Alembert operator is deﬁned as ✷2 = (1/c2)∂2/∂t2 − ∇2 and φ represents the relativistic state vector. This a called the ‘Klein-Gordon equation’. In relativistic quantum mechanics it is customary to use units

(12)

in which c = ¯h = 1, so the Klein-Gordon equation, Eq. (A36) is written simply as:

(✷2 + m2) φ(x) = [∂µ∂µ +m2]φ(x) = 0 , (A37) where x is the 4-vector (r, t), inner products have the metric (−,−,−,+), and ✷2 takes the form

∂2/∂t2−∇2 =∂µ∂µ, the latter form having been writen using the notation ∂µ = ∂/∂xµ and also the ‘usual’ relativistic convention according to which the index µ runs over the 4 dimensions and the sum is implicitly implied over repeated indices. The Klein-Gordon equation is relativistically covariant, as desired, but its interpretation as a wave equation for a single particle (so, not as a second-quantization field equation) is problematic for two major reasons. First, plane-wave solutions of Eq. (A37) are of the form φ(r, t) = exp[−i(k · r − Et)] with both positive and negative energies, E = ±k2 +m2. One may attempt to interpret the negative-energy waves as anti-particles of the positive-energy solutions, but this interpretation encounters problems as soon as interactions are introduced. For example, adding the interaction with the electromagnetic field, ∂µ → ∂µ − ieAµ, we get an equation which admits transitions between negative- and positive-energy solutions. This problem affects all relativistic first-quantization formulations and can be bypassed only in second quantization. More serious is a second problem specific to the Kein-Gordon equation: The charge-current 4-vector

jµ(x) = i[∂µφ∗(x)φ(x)− φ∗(x)∂µφ(x)] , (A38) obeys the conservation law

∂µjµ(x) = 0 , (A39)

as it follows directly from Eq. (A37), but attempting to identifyρ(x) = −ij₀(x) = −i[ ˙φ∗(x)−φ∗(x) ˙φ(x)]as a probability density yields negative values. For example, for an eigenstate of energy E, ρ(x) = 2Eφ∗(x)φ(x), which is negative for negative E. One may simply ignore negative-energy solutions, but, yet again, interactions can induce transitions to those and the interpretation becomes really problematic.

A second approach is quite diﬀerent. We saw that squaring Eq. (A25) results in negative-energy solutions and negative probability densities. On the other hand, retaining the square root is not an option, since it results in

(13)

nonlocal, unbound operators. But looking at the expression

E2 = p2 + m2 (A40)

one is reminded of the ’factorization’ of the form A2 = α2 + β2 as AA∗ where A = α + iβ. This look s like a clean way to take some sort of ‘square root’. The quantity i (the imaginary unit) does the trickwhen we consider the ‘two-dimensional’ form AA∗ = α2 + β2, but we need other mathematical objects in the four-dimensional case we are considering here. We require E2 = m2 + p2 to be expressed in a form of the type (βm+α·p)(βm+α·p), where α and β are mathematical entities yet to be determined. The relation we just wrote can be satisﬁed only if βα_i + α_iβ = 0 and α_iα_j + α_jα_i = 0 for i = j (so that all the unwanted cross-terms vanish) and α2_i = β2 = 1. This implies that the quantities β and α must be matrices. It turns out that these quantities (with algebraic properites related to the so-called ‘quaternions’) have to be

4 × 4 matrices and can be represented in terms of the Pauli matrices as follows:

γi = 0 σ_i −σ_i 0 , γ0 = I 0 0 −I (A41)

where i = 1,3, I is the 2 × 2 identity matrix and σ_i are the Pauli matrices:

σ₁ = 0 1 1 0 σ₂ = 0 −i i 0 σ₃ = 1 0 0 −1 . (A42)

In the literature one may find several different conventions regarding the definitions of the Dirac matrices, the difference originating from the different convention used for the metric: A 4-vector x can be defined as (x₀,x) with (+,−,−,−) metric, as done here. Alternatively, x can be defined as (x, x_{4) = (}x, ix₀₎ with purely imaginaryx₄ and Eucledian metric (+,+,+,+).

Note that these matrices obey the anti-commutation laws:

[γµ, γν]₊ = 2 gµν , (A43) where gµν = gµν is the diagonal metric tensor (+,−,−,−). To see that these matrices accomplish what we

(14)

set out to do, let’s write the wave equation

(−iγµ∂µ + m)ψ(x) = 0 (or (−iγµ∂µ +

mc ¯

h )ψ(x) = 0 with units restored, x0 = ct) , (A44)

where ψ is now a 4-vector ψα, where α = 0,3. This index can be viewed as a spin-index × a sign-index labeling the sign of the energy (positive for electrons, negative for positrons, in Dirac’s original interpretation) and spin for the spin-1/2 case. If we multiply the Dirac equation, Eq. (A44), by (−iγλ∂_λ − m), we have

(−iγλ∂_λ − m)(−iγµ∂µ +m)ψ(x) = (−γλγµ∂λ∂µ − m2)ψ(x) = 0 . (A45) Now, expressing γ_λγµ∂λ∂µ as 1/2 the sum this quantity and itself, swapping the name of the dummy indices λ and µ in the second term, noticing that ∂λ∂µ = ∂µ∂λ and using the anticommutation properties, Eq. (A43), we can write this as:

₁ 2(γ λ γµ∂_λ∂µ + γµγλ∂λ∂µ) + m2 ψ(x) = ₁ 2[γ λ , γµ]₊∂_λ∂µ + m2 ψ(x) = (∂µ∂µ+m2)ψ(x) = 0, (A46)

which is the Klein-Gordon equation. So, we have eﬀectively ‘squared’ the dispersion and Dirac’s equation can be viewed as a sort of square root of the Klein-Gordon equation.

The plane-wave solutions of Dirac’s equation can be obtained as follows. Let’s express the desired solution in the form

ψ(x) = w(k, E) ei(k·r−iEt) , (A47) where w(k, E) is a four-component spinor which satisﬁes the equation

(γ0E − γ · k + m)w(k, E) = 0 . (A48) Let us now write w(k, E) in terms of two two-component objects w₊ and w₋:

w(k, E) = w₊(k, E) w₋(k, E) . (A49)

(15)

The quantities w_± must satisfy the two coupled two-component equations: σ · kw₋ = (E − m)w₊

σ · kw₊ = (E + m)w₋ . (A50)

For m = 0 it is convenient to employ the rest frame of the particle. We have either

E = m w₊ = 0 w₋ = 0 , (A51) or E = −m w₊ = 0 w₋ = 0 . (A52)

The nonzero component w_± can be chosen arbitrarily, so it is convenient to consider them eigenvalues of σz. Thus we have the four solutions:

    1 0 0 0         0 1 0 0         0 0 1 0         0 0 0 1     . (A53)

The first two solutions correspond to particles of energy E = m with spin up and down, respectively, while the latter solutions correspond to particles of energy E = −m. The existence of negative-energy solutions constitute the problem which, as mentioned above, affects all first-quantization relativistic formulations. Dirac suggested that these solutions should be intepreted as anti-particles (i.e., positrons in the case of electrons). Note how spin, introduced as an ad hoc new degree of freedom in conventional Quantum Mechanics, now emerges naturally from the formulation of a relativistic wave equation. This constitutes a major success of Dirac’s theory.

(16)

For arbitrary k the solutions are: u_k_,σ = wσ(k, E_k) = E_k +m 2m ₁/2 _ξ σ σ·k E_k+mξσ , (A54) (for σ = 1,2) and v_k_,σ = wσ(−k,−E_k) = E_k + m 2m ₁/2 σ·k E_k+mξσ ξσ . (A55) In these expressions ξ₁ = 1 0 and ξ₂ = 0 1

. Considering now the case m = 0 (as for the early theories of neutrinos or looking at the analogy with the dispersion in graphene), we can chose the 4 independent 4-component spinors (in place of Eqns. (A54) and (A55)):

u_k_,σ = √1 2 ζσ σ·k k ζσ , (A56) (for σ = 1,2) and v_k_,σ = √1 2 _σ_·_k k ζσ ζσ , (A57)

corresponding to the negative and positive-energy dispersion E = ±k on the ‘light cone’. The 2-component spinors ζσ are now chosen not as eigenstates of σz, but as eigenstates of the ‘helicity’ σ · k/k:

σ · n ζ₁ = ζ₁

σ · n ζ₂ = −ζ₂ , (A58)

where n = k/k is the unit vector along the direction of motion. This is rendered necessary by a profound diﬀerence between massless and massive particles: For massive particles one is free to select aribitrarily the

(17)

rest-frame form of w_± in Eqns. (A51)-(A55), since one can chose to deﬁne the polarization in the rest frame along any arbitrary direction. On the contrary, for massless particles traveling at the speed of light there is no such a thing as a ‘rest frame’ and the polarization must be measured along the direction of motion, k/k.