Einstein Sum Convention - Particles and Fields

Particles and Fields

5.4 Einstein Sum Convention

If necessity is the mother of invention, laziness is the father. The Einstein summation convention is an offspring of this happy marriage. We introduced it in Section 4.4.2, and now we explore its use a little further.

Whenever you see the same index both downstairs and upstairs in a single term, you automatically sum over that index. Summation is implied, and you don’t need a summation symbol. For example, the term

means

A⁰ A₀ + A¹ A₁ + A² A₂ + A³ A₃

because the same index μ appears both upstairs and downstairs in the same term. On the other hand, the term

A_νA^μ

does not imply summation, because the upstairs and downstairs indices are not the same. Likewise,

A_νA_ν

does not imply summation even though the index ν is repeated, because both indices are downstairs.

You may recall that some of the equations in Section 3.4.3 used the symbol to signify the sum of squares of space components. By using upstairs and downstairs indices along with the summation convention, we could have written

which is more elegant and precise.

The operation of Expression 5.19 has the effect of changing the sign of the time component. I should warn you that some authors follow the convention (+1, −1, −1, −1) for the placement of these minus signs. I prefer the convention (−1, 1, 1, 1), typically used by those who study general relativity.

An index that triggers the summation convention, like ν in the following example, doesn’t have a specific value. It’s called a summation index or a dummy index; it’s a thing you sum over. By contrast, an index that is not summed over is called a free index. The expression

depends on μ (which is a free index), but it doesn’t depend on the summation index ν. If we replace ν with any other Greek letter, the expression would have exactly the same meaning. I should also mention that the terms upstairs index and downstairs index have formal names. An upper index is called contravariant, and a lower index is called covariant. I often use the simpler words upper and lower, but you should learn the formal terms as well. We can have A with an upper (contravariant) index, or A with a lower (covariant) index, and we use the matrix η to convert one to the other. Converting one kind of index to the other kind is called raising the index or lowering the index, depending on which way we go.

Exercise 5.1: Show that A^νA_ν has the same meaning as A^μA_μ.

Exercise 5.2: Write an expression that undoes the effect of Eq. 5.20. In other words, how do we “go backwards”?

Let’s have another look at the Expression 5.19, A^μA_μ.

This expression is summed over because it contains a repeated index, one upper and one lower. Previously, we expanded it using the indices 0 through 3.

We can write the same expression using the labels t, x, y, and z:

A^μ A_μ = A^tA_t + A^xA_x + A^yA_y + A^zA_z.

For the three space components, the covariant and contravariant versions are exactly the same. The first space component is just (A^x)², and it doesn’t matter whether you put the index upstairs or downstairs. The same is true for the y and z components. But the time component becomes −(A^t)²,

A^μ A_μ = −(A^t)² + (A^x)² + (A^y)² + (A^z)².

The time component has a minus sign because the operation of lowering or raising that index changes its sign. The contravariant and covariant time components have opposite signs, and A^t times A_t is −(A^t)². On the other hand, the contravariant and covariant space components have the same signs.

The quantity A^μA_μ is exactly what we think of as a scalar. It’s the difference of the square of the time component and the square of the space component. If A^μ happens to be a displacement such as X^μ, then it’s the same as the quantity τ², except with an overall minus sign; in other words, it’s −τ². But whatever sign it has, this sum is clearly a scalar.

This process is called contracting the indices, and it’s very general. As long as A^μ is a 4-vector, the quantity A^μA_μ is a scalar. We can take any 4-vector at all and make a scalar by contracting its indices. We can also write A^μA_μ a little differently by referring to Eq. 5.20 and replacing A_μ with η_μνA^ν. In other words, we can write

On the right side, we use the metric η and sum over μ and ν. Both sides of Eq.

5.21 represent the same scalar. Now let’s look at an example involving two different 4-vectors, A and B. Consider the expression

A^μB_μ.

Is this a scalar? It certainly looks like one. It has no indices because all the indices have been summed over.

To prove that it’s a scalar, we’ll need to rely on the fact that the sums and differences of scalars are also scalars. If we have two scalar quantities, then by definition you and I will agree about their values even though our reference frames are different. But if we agree about their values, we must also agree about the value of their sum and the value of their difference. Therefore, the sum of two scalars is a scalar, and the difference of two scalars is also a scalar. If we keep this in mind, the proof is easy. Just start with two 4-vectors A^μ and B^μ and write the expression

(A + B)^μ(A + B)_μ.

This expression must be a scalar. Why is that? Because both A^μ and B^μ are 4-vectors, their sum (A + B)^μ is also a 4-vector. If you contract any 4-vector with itself, the result is a scalar. Now, let’s modify this expression by subtracting (A

− B)^μ(A − B)_μ. This becomes

This modified expression is still a scalar because it’s the difference of two scalars. If we expand the expression, we find that the A^μA_μ terms cancel, and so do the B^μB_μ terms. The only remaining terms are A^μB_μ and A_μB^μ, and the result is

I’ll leave it as an exercise to prove that

A^μB_μ = A_μB^μ.

It doesn’t matter if you put the ups down and the downs up; the result is the same. Therefore, the expression evaluates to

Because we know that the original Expression 5.22 is a scalar, the result A^μB_μ must also be a scalar.

You may have noticed that the expression A^μB_μ looks a lot like the ordinary dot product of two space vectors. You can think of A^μB_μ as the Lorentz or Minkowski version of the dot product. The only real difference is the change of sign for the time component, facilitated by the metric η.

In document Copyright. Published by Basic Books, an imprint of Perseus Books, LLC, a subsidiary of Hachette Book Group, Inc. (Page 158-162)