The Geometry behind the Special Relativity Theory

(1)

The Geometry behind the Special Relativity Theory

Paul Anthony Front´eri June 4, 2012

Bachelor Project in Mathematics

(2)

Abstract

This project is an attempt of approaching known physical concepts from a mathematical point of view, or more precisely, from a geometrical framework. In particular, we are studying concepts from special relativity (Lorentz transformation), electromagnetism (Skew-symmetry), and quantum mechanics/fields (Spinors).

Most of the material of the project is based on the book ”The Geometry of Minkowski Space. An introduction to the Mathematics of the Special Relativity Theory” by Gregory L.

Naber [1]. But other literature have also been used, such as ”Semi-Riemannian Geometry”

by Barrett O’Neill [2], ”Introduction to 2-Spinors in General Relativity” by Peter O’Donnell [3] and ”Riemannian manifolds” by John M. Lee [4] to complement the mathematical understanding. My education in physics allowed me to understand the content and motivation of the G. L. Naber book, but from my point of view there is a lack of mathematical logic and care. One of the aims of the project was to complement the physically interesting material of the book written by Gregory L. Naber by a mathematical understanding of the geometry and algebraic structures behind the physical processes.

The first Chapter starts with the motivation for studying the special relativity theory from a mathematical point of view. The second Chapter consists of an account of the geometry of the Minkowski space and the Lorentz transformation in this space. Sections 2.1 to 2.4 concern the basic theory behind, and in Section 2.5 some practical examples of the usage of the Lorentz transformation are presented in order to connect the mathematical theory and physics. The third Chapter goes further and introduces particles and charged particle, which naturally introduces the electric and magnetics fields (electromagnetism). As in the second Chapter, Sections 3.1 and 3.2 deal with the mathematical framework, meanwhile Section 3.3 gives the physical application of this framework. The last Chapter includes one additional structure that posses our particles, namely spin. Sections 4.1 - 4.4 are devoted to the presentation of the theory, and Section 4.5 gives a clear image for the ”double-value” property of the spinors by the null flag.

(3)

Acknowledgements

I which to thank my supervisor on this project, Irina Markina. Thank you for giving me a interesting, exciting and sometimes very hard project. And also, just as importantly, for having the patient and believing in me.

Paul Anthony Front´eri 09 May, 2012

(4)

1 Our world as a space-time

Our world, the universe, the space-time or the reality, is a four-dimensional space endowed with a special kind of metric called Minkowski metric. These are diﬀerent names for one common concept, namely, for the place we live and breathe in. Nevertheless, the reality of our habitat has not always been thought of this form. Some time ago one though that the world as flat, a disk where the world states and where sailors on the far seas have to be careful not to fall into the deep of nothing.

Following to the geometry of Euclid, Descartes assigned to each observerO a coordinate system(x₁, x₂, x₃), which one could identify with three space dimensions.

Galileo Galilei discovered that laws of mechanics are the same for all observers. So, given an observation by one observerO, one could translate the observation to an other observer ˜O by the Galilean transformation. Time was disconnected from space itself. The ancient Greeks thought that time was not real, it was just an illusion of the mind, of the flow of the world.

Newton made it more concrete by declaring it absolute.

The first who imagined the universe in more than three dimension were mathematicians:

Gauss, Riemann, M¨obius, Lorentz, Poincare and Minkowski, just to mention a few of them.

They assumed that the reality should not only have more dimensions, but also should be endowed with a curvature. The curvature could be positive, like of the sphereSⁿ, or negative like of the hyperbolic spaceHⁿ. But what does the spaces of the mathematical imagination, to do with the real world we live in? It turned out that they much more related than one could expect. The starting point in the understanding of this connection belongs to the physicist Albert Einstein, who invented theories of special and general relativity.

As the first step, our three dimensional space and time were not anymore thought of as two separated concepts, but they became a part of a bigger space. Namely, they formed the space- time, or a four-dimensional space which was furnished with special tools of measurement.

Each event in the universe, was described as one point in this space-time. The birth and death of Einstein, are two diﬀerent points in the space-time, connected by the worldline of Einstein himself: the journey of his life in this universe of ours. However, binding these two concepts of space and time was not all. They became also interconnected, space and time were no longer absolute, as it was according Newton’s theory, but relative to each observers viewpoint. Thereby, the namerelativity theory comes.

The distance between two points in the space-time was no longer given by the Euclidean distance, or its metric, but it became a byproduct of the Minkowski metric. The transformation between two observers was changed from the Galilean transformation to the Lorentzian one in order to preserve the new type of metric in the universe.

(6)

2 Geometry of Minkowski space and the Lorentz transforma- tions

This chapter goes through the general geometric structures of the Minkowski space, such as inner product, null cone, future directedness and causality. It is also defined the Lorentz transformation and the Lorentz group. Timelike and spacelike vector properties are discussed, while a physical meaning is postponed to the end of the Chapter.

2.1 Definitions

LetV be an arbitrary vector space of dimension n≥1 over the real numbers R and v, ω be elements (vectors) of this vector space.

Definition 2.1(Bilinear form). A bilinear form on V is a map g∶V×V →Rthat is linear in each variable, i.e. such thatg(a₁v₁+a₂v₂, ω)=a₁g(v₁, ω)+a₂g(v₂, ω) andg(v, a₁ω₁+a₂ω₂)= a₁g(v, ω₁)+a₂g(v, ω₂), where a∈R,∀v, ω∈V

A bilinear form g is symmetric if g(v, ω) = g(ω, v), ∀v, ω ∈ V and non-degenerate if g(v, ω)=0 for∀ω∈V impliesv=0.

Definition 2.2 (Scalar product). A non-degenerate, symmetric, bilinear form g is called an scalar product and the imageg(v, ω) is written as v⋅ω.

Note that Naber [1] use the term inner product for the scalar product. In general is a scalar product g for which g(v, v) > 0 if v ≠0 called positive definite (or inner product), negative definite for whichg(v, v)<0 ifv≠0, and indefinite if it is neither positive or negative definite. A vector spaceV, with a scalar product g,(V, g) is called a scalar space.

Definition 2.3(Orthogonality). If (V, g) is an n-dimensional scalar space then, two vectors v, ω∈V are said to be g-orthogonal (or just orthogonal) ifg(v, ω)=0.

Definition 2.4 (Orthogonal complement). If W is a subspace of V, then the orthogonal complementW^⊥ of W in V is defined by W^⊥={v∈V�g(v, ω)=0, ∀ω∈W}.

Definition 2.5(Quadratic forms). The quadratic forms associated with a scalar productgon an n-dimensional vector spaceV is the mapQ∶V →R defined byQ(v)=g(v, v)=v², v∈V. Definition 2.6 (Index of bilinear form [2]). The index of a bilinear form Q defined on a vector space V is the maximal dimension of the subspace of V where the form Q is negative definite.

Theorem 2.7 ([1] page 8,[2]). Let V be an n-dimensional real vector space on which a non- degenerate, symmetric, bilinear form g ∶ V ×V → R is defined. Then there exists a basis {e₁, . . . , e_n} for V such that g(e_i, e_j) = 0 if i ≠ j and Q(e_i) = ±1 for each i = 1, . . . , n.

Moreover, the number of basis vectorse_i for which Q(e_i)=−1 is the same for any such basis and coincides with the index ofg. Such a basis is called for a orthonormal basis.

(7)

2.2 Minkowski Space

All the definitions in this section has been taken from Naber [1] pages 9-18 and 64-66.

The Minkowski spacetime is a 4-dimensional real vector spaceMon which a non-degenerate, symmetric, bilinear formgof index 1 is defined. The points ofMare called events in physics andg is referred to as a Lorentz scalar product onM.

One make the use of theEinstein summation. To illustrate the usage of the summation convention do one have the indicesaand bthat range over the set {1,2,3,4}, then

x^ae_a=�⁴

a=1x^ae_a=x¹e₁+x²e₂+x³e₃+x⁴e₄, Λ^a_bx^b=�⁴

b=1Λâ_bx^b=Λâ₁x¹+Λâ₂x²+Λâ₃x³+Λâ₄x⁴,

η_abv^aω^b=η₁₁v¹ω¹+η₁₂v¹ω²+η₁₃v¹ω³+η₁₄v¹ω⁴+η₂₁v²ω¹+. . .+η₄₄v⁴ω⁴.

An eventx∈Mexpressed in the orthonormal basis {e₁, e₂, e₃, e₄}={e_a} of Mis written asx=xâea=x¹e1+x²e2+x³e3+x⁴e4. Here(x1, x2, x3, x4)are called coordinates ofxrelative to the basis {e_a}, with the spatial (x₁, x₂, x₃) and time (x₄) coordinates. Choosing two elementsv, ω inMrelative to same basis ofM, one hasv=vâe_a,ω=ωâe_a, and

g(v, ω)=v¹ω¹+v²ω²+v³ω³−v⁴ω⁴. The metric written in matrix form is

η=

��

�

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 −1

��

� with entriesη_ab. Thus

η_ab=��

��

1 a=b=1,2,3

−1 a=b=4 0 otherwise.

One can then writeg(e_a, e_b)=η_ab, andg(v, ω)=η_abv^aω^b. The inverse matrix to η={η_ab} is denoted byη⁻¹={η^ab}=η.

Since the Lorentz scalar product is not positive definite, there exist nonzero vectorsv∈M such thatg(v, v)=Q(v)=0. Such vectors are called null (or lightlike) vectors.

Theorem 2.8 ([1] page 10,[2]). Two nonzero null vectors v and ω in M are orthogonal if and only if they are parallel, i.e. if there ist∈R such thatv=tω.

Consider two distinct eventsx₀ and x for which the displacement vector v=x−x₀ from x₀ toxis null, i.e., Q(v)=Q(x−x₀)=0. Expressing this in relation to an orthonormal basis {ea}, do one have that x=x^aea andx0=x^a₀ea, such that

(x₁−x¹₀)²+(x₂−x²₀)²+(x₃−x³₀)²+(x₄−x⁴₀)² =0.

This equation describes a cone in the four dimensional space R⁴ with a vertex at the point (x¹₀, x²₀, x³₀, x⁴₀).

(8)

Definition 2.9 (Null cone, null line). The null cone (or light cone) CN(x0) at x0 in M is defined by

C_N(x₀)={x∈M�Q(x−x₀)=0}.

The null line (or physically, a light ray) R_x₀_,x containing x₀ and x is defined by R_x₀_,x={x₀+t(x−x₀)�t∈R}.

Notice that: R_x₀_,x=R_x,x₀.

Theorem 2.10 ([1] page 12). Let x₀ and x be two distinct events with Q(x−x₀)=0.Then R_x₀_,x=C_N(x₀)∩C_N(x).

A vector v in M is said to be timelike if Q(v)< 0 and spacelike if Q(v) >0, or if it is the zero vectorv=0. Further if v is the timelike displacement vector v=x−x₀, then v will be inside the null coneC_N(x₀). If it is spacelike, then it is outside the cone. A timelike line T={x0+t(x−x0)�t∈R} is a line inM, wherex−x0 is a timelike vector.

Theorem 2.11([1] page 16). Suppose that v is timelike and ω is either timelike or null. Let {e_a} be a orthonormal basis forM withv=v^ae_a and ω^ae_a. Then either

(a) v⁴ω⁴>0, in which case g(v, ω)<0, or (b) v⁴ω⁴<0, in which case g(v, ω)>0.

Corollary 2.12 ([1] page 16). If a nonzero vector in M is orthogonal to a timelike vector, then it must be spacelike.

We have found now some basic building blocks for the Minkowski space, but we would like to have some sort of time orientation inM. We define then the future and past direction.

Definition 2.13 (Future- and Past-directed Equivalence Relation). By taking the collection of all timelike vectors inMcalled τ, we define an equivalence relation ∼onτ: if v andω are in τ, then v∼ω if and only if g(v, ω)<0.

Note that in the collection τ, the coordinates v⁴ and ω⁴ will have the same sign in any orthonormal basis ofMby Theorem (2.11). This leads us to that the collectionτ is the union of two distinct subsets,τ⁺ for positive value ofv⁴ (orω⁴) andτ⁻ for negative value ofv⁴ (or ω⁴). They have the following properties: v ∼ ω for all v, ω ∈τ⁺, v ∼ω for all v, ω∈ τ⁻ and v∼�ω if one of the vectors v or ω is in τ⁺ and the other one belongs toτ⁻. The elements of τ⁺are called future directed and the elements inτ⁻ arepast-directed.

Definition 2.14 (Time cone, future time cone and past time cone). For each x₀ in M we define the time coneC_T(x₀), future time cone C_T⁺(x₀) and past time cone C_T⁻(x₀) at x₀ by

C_T(x₀)={x∈M�Q(x−x₀)<0}, C_T⁺(x₀)={x∈M�x−x₀∈τ⁺}=C_T(x₀)∩τ⁺, C_T⁻(x₀)={x∈M�x−x₀∈τ⁻}=C_T(x₀)∩τ⁻.

The time cone C_T(x₀) is a open solid cone which boundary is the null coneC_N(x₀) and it is the disjoint union ofC⁺(x ) and C⁻(x ).

(9)

Definition 2.15 (Future- and Past-directed nonzero null vectors). A nonzero null vector n is future-directed ifn⋅v<0,∀v∈τ⁺ and it is past-directed ifn⋅v>0,∀v∈τ⁺.

Note that one uses only τ⁺ to define the direction of the null vector. This is due to the sign of the productn⋅v that is the same for all v∈τ⁺.

Definition 2.16 (Future- and Past null cone at x₀). For any x₀ ∈ M we define the future null cone at x0 by

C_N⁺(x₀)={x∈C_N(x₀)�x−x₀ is f uture directed}, and past null cone atx₀ by

C_N⁻(x0)={x∈C_N(x0)�x−x0 is past directed}.

Illustration of the future and past cone for a point. Time is the vertical axis, and space the others. As we see, we can define a plane of simultaneity, where all these events happen at

the same time.

With the basic building blocks of the Minkowski space and time orientation one can define a causal structure onM.

Definition 2.17(Chronologic, causal relations). There exist two order relations onMcalled causality relations onM. For two eventsx and y in M we say that

1) the event x chronologically precedes y, and write x�y, if the displacement vector y−x is timelike and future-directed.

2) the event x causally precedesy, and write x<y, if the displacement vector y−x is null and future-directed.

Definition 2.18 (Causal Automorphism). A map F ∶M→M is said to be a causal automorphism if it is bijective, and bothF andF⁻¹ preserves the causal order <, i.e.,x<y if and only ifF(x)<F(y). Note that, in particular, F is not assumed to be linear or continuous.

Definition 2.19 (Translation). A continuous map T ∶M→Mis said to be a translation if T(v)=v+v₀ for some fixed v₀∈M. Geometrically is it a function that moves every point a constant distance in a the direction ofv.

(10)

Definition 2.20 (Dilation). A linear transformation K ∶M→M is said to be a dilation if K(v)=kvfor some positive real numberk. Geometrically is it a function that scales, enlarges or increases the vectors by a factork.

Definition 2.21 (Orthochronous). A orthogonal transformation L ∶M→ M is said to be orthochronous ifx⋅Lx<0 for all timelike or null vectors x.

In the following theorem the structure of orthochronous transformations is presented.

They are essentially translations, dilations and the orthogonal transformations. In this sense the orthochronous transformations of the Minkowski space are analogous of the M¨obius transformations of the Euclidean space.

Theorem 2.22(Zeeman Theorem [1] page 66). LetF ∶M→Mbe a causal automorphism of M. Then there exists an orthochronous orthogonal transformation L∶M→M, a translation T∶M→M, and a dilation K∶M→Msuch that F =T○K○L.

2.3 Lorentz Group

One have now seen how the causal structure came from the Lorentz metric in the Minkowski space and the definitions of null-, space-, and timelike vectors in Minkowski space. However, we need to relate two diﬀerent observers O and ˆO in the Minkowski space, ”interpret” dif- ferent events. One must introduce a transformation rule between two observers, relating an event viewed from the first observer respective to the second one. In the Minkowski space this transformation is called the Lorentz group. We shall now look on some of its properties.

All the definitions in this section has been taken from Naber [1] pages 18-22.

Definition 2.23 (Linear and orthogonal transformations). If {ea} and {ˆea} are two orthonormal bases for M then there is a unique linear transformation L∶M→M such that L(e_a) = ˆe_a for each a = 1,2,3,4. The linear transformation L ∶M → M is said to be an orthogonal transformation of M if g(Lx, Ly) = g(x, y) for ∀x, y ∈ M, so, it preserves the length of vectors.

Lemma 2.24([1] page 13,[2]). LetL∶M→Mbe a linear transformation. Then the following are equivalent:

(a) L is an orthogonal transformation.

(b) L preserves the quadratic form of M, i.e., Q(Lx)=Q(x) for ∀x∈M

(c) L carries any orthonormal basis of Monto another orthonormal basis of M.

We define the matrix Λ= �Λ^a_b�_a,b₌_1,2,3,4 associated to the orthogonal transformation L and the orthonormal basis{e_a}by

Λ=

��

�

Λ¹₁ Λ¹₂ Λ¹₃ Λ¹₄ Λ²₁ Λ²₂ Λ²₃ Λ²₄ Λ³₁ Λ³₂ Λ³₃ Λ³₄ Λ⁴₁ Λ⁴₂ Λ⁴₃ Λ⁴₄

��

� ,

(11)

meaning that for two orthonormal bases {ea} and {ˆea} for M, each element of {ea} can be expressed as a linear combination of the ˆe_a:

e_u=Λ¹_uˆe₁+Λ²_ueˆ₂+Λ³_ueˆ₃+Λ⁴_ueˆ₄. The orthogonality conditionsg(e_c, e_d)=η_cd, c, d=1,2,3,4,is written as

Λ¹_cΛ¹_d+Λ²_cΛ²_d+Λ³_cΛ³_d−Λ⁴_cΛ⁴_d=η_cd, or, with the summation convention,

Λâ_cΛ^b_dη_ab=η_cd and Λâ_cΛ^b_dη^cd=ηâb. (1) Having this matrix we can write the transformation of one event written in basis of {ea} to itself written in another basis{eˆ_a} by:

ˆ

x^a=Λ^a_bx^b, a=1,2,3,4, (2) or more detailed

ˆ

x¹=Λ¹₁x¹+Λ¹₂x²+Λ¹₃x³+Λ¹₄x⁴ ˆ

x²=Λ²₁x¹+Λ²₂x²+Λ²₃x³+Λ²₄x⁴ ˆ

x³=Λ³₁x¹+Λ³₂x²+Λ³₃x³+Λ³₄x⁴ ˆ

x⁴=Λ⁴₁x¹+Λ⁴₂x²+Λ⁴₃x³+Λ⁴₄x⁴.

Definition 2.25(General (homogeneous) Lorentz transformation). Any4×4matrixΛwhich satisfies

Λ^TηΛ=η (3)

where T is the transposed, is called an general (homogeneous) Lorentz transformation.

Note that the eq. (3) is equivalent to eq. (1). Moreover one can find the inverse of Λ by Λ⁻¹=ηΛ^Tη.

With this one can write the entries of the inverse matrix Λ⁻¹ =�Λ_a^b�a,b=1,2,3,4 relative to the orthonormal basis{e_a} by,

Λ⁻¹=

��

�

Λ₁¹ Λ₂¹ Λ₃¹ Λ₄¹ Λ₁² Λ₂² Λ₃² Λ₄² Λ₁³ Λ₂³ Λ₃³ Λ₄³ Λ₁⁴ Λ₂⁴ Λ₃⁴ Λ₄⁴

��

�

=

��

�

Λ¹₁ Λ²₁ Λ³₁ −Λ⁴₁ Λ¹₂ Λ²₂ Λ³₂ −Λ⁴₂ Λ¹₃ Λ²₃ Λ³₃ −Λ⁴₃

−Λ¹₄ −Λ²₄ −Λ³₄ Λ⁴₄

��

�

. (4)

Definition 2.26(General Lorentz GroupLGH). The set of all general (homogeneous) Lorentz transformations form a group under matrix multiplication. This group is called the general (homogeneous) Lorentz group and is denoted byLGH.

Note that physically not all transformations fromLGH are interesting. The most important those which preserve the time orientations and do the same for the spacelike part of the Minkowski space. Now we are going to define these special transformations inside ofLGH.

Settingc=d=4 in Eq. (1) one obtains(Λ⁴₄)²=1+(Λ¹₄)²+(Λ²₄)²+(Λ²₄)², that implies (Λ⁴₄)²≥1. Consequently,

Λ⁴₄≥1 or Λ⁴₄ ≤−1.

(12)

Definition 2.27. Any element Λ of L^GH is said to be orthochronous if Λ⁴₄ ≥ 1 or it is non-orthochronous if Λ⁴₄≤−1.

Theorem 2.28([1] page 18,[2]). LetΛ=�Λ^a_b�a,b=1,2,3,4be an element ofL^GH and{ea}^a=1,2,3,4

an orthonormal basis forM. Then the following are equivalent:

(a) Λ is orthochronous,

(b) Λ preserves the time orientation of all null vectors, i.e., ifv=v^ae_ais a null vector, then the numbers v⁴ and ˆv⁴ =Λ⁴_bv^b have the same sign,

(c) Λ preserves the time orientation of all timelike vectors.

Notice that if Λ nonorthochronous, then it reverses the time orientation of all timelike and nonzero vectors.

The determinant of Λ is found from taking the determinant of both side of Eq. (3):

det�Λ^T�det(η)det(Λ)=det(η) det�Λ^T�=det(Λ) (det(Λ))²=1, leading to det(Λ)=1 or det(Λ)=−1.

Definition 2.29 (Proper and improper Lorentz transformation). We call a Lorentz transformation proper ifdet Λ=1 and improper if det Λ=−1.

An improper orthochronous Lorentz transformation will reverse the spatial orientation from left-handed to right-handed, or right-handed to left-handed system. All in all, the orthochronous transformations (time preserving elements of L^GH) form a subgroup of L^GH and this subgroup has two components that we called proper and improper transformation.

Now we are ready to the following important definition.

Definition 2.30 (Lorentz Group L). The Lorentz group, L, is the subgroup of the general Lorentz group,LGH, of proper and orthochronous Lorentz transformation.

As we mentioned before, the elements of the Lorentz group are linear transformations ofM that preserve the length of vectors, preserves the time orientation and the space orientation.

Notice that in the literature the general (homogeneous) group often refers as a Lorentz group.

Further one can define an admissible basis, or a reference frame for an observer, where the Lorentz group will act.

Definition 2.31 (Admissible basis and reference frame). We define an admissible basis for Mto be an orthonormal basis {e₁, e₂, e₃, e₄} with e₄ timelike and future-directed and {e_a}= {e₁, e₂, e₃} spacelike and “right-handed”, i.e., satisfying (e₁×e₂)⋅e₃=1.

An admissible basis called as an admissible frame of reference. Any two such bases (frames) are related by a Lorentz transformation from the Lorentz groupL.

(13)

The Lorentz group L has an important subgroup R, consisting of matrices R=�R^a_b� of the form

R=

��

�

�Rⁱ_j� 00 0

0 0 0 1

��

� ,

where �Rⁱ_j�_i,j=1,2,3 is a unimodular orthogonal matrix, i.e., satisfying det�Rⁱ_j� = 1 and

�Rⁱ_j�^T =�Rⁱ_j�⁻¹. The coordinate transformation associated with R corresponds physically corresponds to a rotation of the spatial coordinate axes within a given frame of reference [1]

page 21,[5]. For this reason R is called the rotational subgroup of L and its elements are called rotations inL.

Lemma 2.32 ([1] page 21[5]). Let Λ = �Λ^a_b�a,b=1,2,3,4 be a proper, orthochronous Lorentz transformation. Then the following are equivalent:

(a) Λ is a rotation.

(b) Λ¹₄=Λ²₄=Λ³₄=0 (c) Λ⁴₁=Λ⁴₂=Λ⁴₃=0 (d) Λ⁴₄=1.

2.4 Timelike Vectors and Curves

In two next sections we would like to investigate how a voyage of an observer O in the Minkowski space, represented by a curve in the Minkowski space, behaves and what properties it has. First, we look on properties of timelike vectors itself, and then go ahead to study curves through the behavior of its velocity vector.

This section with theory has been taken, with some rewriting, from Naber [1] pages 46-156.

Definition 2.33(Durationτ(v)). For any timelike vector inMwe define the duration τ(v) of v by τ(v)=�

−Q(v).

Ifvis defined as the displacement vector between two eventsx0 andx, i.e. v=x−x0, then τ(x−x₀) is to be interpreted physically as the time separation ofx₀ and x in any admissible frame of reference in which both events occur at the same spatial location. The duration τ(x−x0) is also a lower bound for the temporal separation of x0 and x, and it is called the proper time separation ofx₀ and x.

Definition 2.34 (Time axis). A null line in the Minkowski space which passes trough the origin is called a time axis.

In general ifT is a time axis, then there exists an admissible basis {eˆ_a} forMsuch that the subspace ofM spanned by ˆe₄ isT. Then the space Span{eˆ₄} is timelike and Span{eˆ₄}^⊥ is spacelike by Cor. (2.12).

(14)

Theorem 2.35(Reversed Schwartz Inequality [1] page 48,[2]). Ifvandω are timelike vectors in M, then

(v⋅ω)² ≥v²ω²

and equality holds if and only ifv and ω are linearly dependent.

Theorem 2.36(Reversed Triangle Inequality [1] page 49,[2]). Let vandωbe timelike vectors with the same time orientation (i.e. v⋅ω<0).Then

τ(v+ω)≥τ(v)+τ(ω) and equality holds if and only ifv and ω are linearly dependent.

Lemma 2.37 ([1] page 50,[2]). The sum of any finite number of vectors in M all of which are timelike or null and all future-directed (respectively, past-directed) is timelike and future- directed (respectively, past-directed) except when all of the vectors are null and parallel, in which case the sum is null and future-directed (respectively, past-directed).

Corollary 2.38 ([1] page 50). Let v1, . . . , vn be timelike vectors, all with the same time orientation. Then

τ(v₁+v₂+ � +v_n)≥τ(v₁)+τ(v₂)+ � +τ(v_n) and equality holds if and only ifv₁, v₂, . . . , v_n are all parallel.

Corollary 2.39 ([1] page 51,[2]). Let v and ω be two non-parallel null vectors. Then v and ω have the same time orientation if and only if v⋅ω<0.

Definition 2.40 (Curve, Worldline). Let I ⊆ R be an open interval. A continuous map α∶I→Mis a curve in M. A curve in the Minkowski space is often called a worldline.

Illustration of curve, or worldline, in Minkowski space.

(15)

Relative to any admissible basis {ea} forM, one can write α(t)=x^a(t)ea for each t∈I.

The curveα is smooth if each component function x^a(t)is infinitely diﬀerentiable and ifα’s velocity vector

α^′(t)=dx^a dt e_a is nonzero for eacht∈I.

A curve α ∶I →M is said to be spacelike, timelike or null, if α^′(t)⋅α^′(t) is >0, <0 or

=0 respectively for each t. A timelike or null curve α is future-directed (respectively past- directed) if α^′(t) is future-directed (respectively past-directed) for each t. This notion can also be extended to intervals that can contain one or both of it endpoints.

Definition 2.41 (Reparametrization of a curve). If α ∶ I → M is a curve and J ⊆ R is another interval and h∶J →I, t=h(s) is an infinitely diﬀerentiable function with h^′(s)>0 for eachs∈J, then the curveβ =α○h∶J →M is called a reparametrization of α

Definition 2.42 (Proper time length). If α∶[a, b]→M is a timelike curve in M, then we define the proper time length ofα by

L(α)=�_a^b�α^′(t)⋅α^′(t)�^1�2dt=�_a^b

�

−η_abdx^a dt

dx^b dt dt.

The proper timeL(α) is interpreted, by theClock Hypothesis, as the time lapse between the eventsα(a)andα(b)as measured by the ideal standard clock carried along by a observer whose travel in the Minkowski space is represented by the curveα.

Theorem 2.43([1] page 53,[2]). Let pand q be two points in M. Then p−q is timelike and future-directed if and only if there exists a smooth, future-directed timelike curveα∶[a, b]→M such thatα(a)=q andα(b)=p.

This leads us to the conclusion that the acceleration has no eﬀect on the rate of the ideal standard clock, i.e., that theinstantaneous rateof such clock depends only on its instantaneous speed and not the rate at which the speed is changing.

In analogue to the proper time separation between two events one can define the proper time function (arclength) on a curve.

Definition 2.44 (Proper time function). Let α ∶I →M be a timelike smooth curve. The proper time function τ(t) onI is defined by

τ =τ(t)=�₀^t�α^′(u)⋅α^′(u)�¹^�²du.

Thus, ^dτ_dt =�α^′(t)⋅α^′(t)�¹^�² is positive and infinitely diﬀerentiable since α is smooth and timelike. The inverset=h(τ) therefore exists and ^dh_dτ =�^dτ_dt�⁻¹>0. We then conclude that τ is a legitimate parameter alongα. Notation: α(τ)=x^a(τ)e_a.

Using the proper time function, we can reparametrize our curve and define an analogue to the velocity vector, namely the world velocity, and world acceleration.

Definition 2.45 (World velocity, 4-velocity). The vector α^′ = ^dx_dτ^aea of α is called the world velocity (or 4-velocity) of α and denoted U(τ)=U^ae_a.

Definition 2.46(World acceleration, 4-acceleration). The second proper time derivativeα^′′ =

d²x^a

dτ² e_a of α is called the world acceleration (or 4-acceleration) ofα and denotedA(τ)=A^ae_a.

(16)

2.4.1 Some properties of the world velocity and acceleration

This example has been taken, with some rewriting, from Naber [1] pages 56 to 58.

Having a world velocityU along a curve the Lorentz dot product with itself and the world acceleration is:

U⋅U =−1 and U⋅A=0, (5)

on each point along the curveα.

For a given curve an admissible observer would probably prefer to parametrize the curve by his timex⁴ than by the proper time τ. This will give us

dτ

dx⁴ =�α^′(x⁴)⋅α^′(x⁴)�^1�2 =

��

�1−��

��dx¹ dx⁴�

2

+�dx² dx⁴�

2

+�dx³ dx⁴�

2��

��=�

1−β²(x⁴)=γ(x⁴)=γ, whereβ(x⁴) is the usual instantaneous speed of the observer whose curve isα relative to the frameS(x¹, x², x³, x⁴). We get then

Uⁱ =dxⁱ dτ = dxⁱ

dx⁴ dx⁴

dτ =γdxⁱ

dx⁴, i=1,2,3, and

U⁴=γ Thus

U =U^ae_a=γdx¹

dx⁴e₁+γdx²

dx⁴e₂+γdx³

dx⁴e₃+γe₄ or

U =(U¹, U², U³, U⁴)=γ�dx¹ dx⁴,dx²

dx⁴,dx³ dx⁴,1�, Similarly, we get the world acceleration

Aⁱ=γ d

dx⁴ �γdxⁱ

dx⁴�, i=1,2,3, and

A⁴=γ d dx⁴(γ), So

U =(A¹, A², A³, A⁴)=γ d

dx⁴ �γdx¹ dx⁴, γdx²

dx⁴, γdx³ dx⁴, γ�.

(17)

At each fixed point α(τ0) along a timelike curveα, the world velocity U(τ0), is a future- directed unit timelike vector. It is often used as the timelike vector e₄ in some admissible basis forM. Relative to such a basis, U(τ₀)=(0,0,0,1). Letting x⁴₀=x⁴(τ₀) we find

�dxⁱ dx⁴�

x⁴=x⁴₀ =0, i=1,2,3,

and soβ(x⁴₀)=0 andγ(x⁴₀)=1. One say that the frame of reference corresponds to a basis that is ”momentarily at rest”(x⁴ =x⁴₀). Any such frame of references is called an instantaneous rest frame for α at α(τ₀), which is of importance in physics. In a instantaneous rest frame one will then have that g(A, A)=��a�², wherea�=u�˙ is the 3-acceleration in S defined by the ordinary derivative of the 3-velocity in Eq. (6).

2.5 Spacelike Vectors

This part with theory has been taken, with some rewriting, from Naber [1] pages 61-63.

Spacelike separationsx−x₀, i.e., two eventsx andx₀ for whichQ(x−x₀)>0, liesoutside of the null cone at x₀. There does not exist an admissible basis in which the separation of two events is zero, i.e., there is no admissible observer who can experience both events. One then has to travel faster then the speed of light.

Choosing an admissible frameS for an observer for which the separation ∆x⁴ ofxand x₀ isr∈R, an observer in another admissible frame ˜S, in general, will not agree on the temperal order of xand x₀.

Definition 2.47 (Proper Spatial Separation S). For any two events x and x₀ for which Q(x−x0)>0 (spacelike vector) one define the proper spatial separation S(x−x0) of x and x₀ by

S(x−x₀)=�

Q(x−x₀).

Suppose that the spacelike displacement vectorx−x₀ is orthogonal to the timelike lineT (containingx1,x2, and x). Then the proper spatial separation would be

S(x−x₀)= 1

2(τ(x₀−x₁)+τ(x₂−x₁)).

Physically one can interpreted this as the distance of an eventxmeasured by an admissible observerO, between the emission and reception of light signals connecting O withx.

Suppose now that v and ω are nonzero vectors inM with v⋅ω=0. If v and ω are null, then they must be parallel. If on the other hand v is timelike then ω must be spacelike.

And ifv and ω are both spacelike then their proper spatial lengths satisfy the Pythagorean TheoremS²(v+ω)=S²(v)+S²(ω). This we proved in the Appendix, Exercise 1.5.3.

(18)

2.6 Some Classical Example

2.6.1 Time dilation and relativity of simultaneity

Consider two admissible frames of referenceS and ˆS with two admissible bases{e_a} and {ê_a}, respectively. Two events on a curve in ˆS being spatially (physically) at rest will satisfy the equations ∆ˆx1 = ∆ˆx2 = ∆ˆx3=0. The value ∆ê4 is the time difference between these two events. The coordinate difference inS then will be:

∆x^b=Λ_a^b∆ˆx^a=Λ₄^b∆ˆx⁴.

From this and the fact that Λ₄⁴ and Λ⁴₄ are nonzero it follows that the ratio

∆xⁱ

∆x⁴ = Λ₄ⁱ

Λ₄⁴ =−Λ⁴_i

Λ⁴₄, i=1,2,3,

is constant and independent on the particular point at rest in ˆS. Physically, these ratios are interpreted as the components of the ordinary velocity 3-vector of ˆS relative to S:

�

u=u¹e1+u²e2+u³e3, where uⁱ= Λ₄ⁱ

Λ₄⁴ =−Λ⁴_i

Λ⁴₄, i=1,2,3. (6) Similarly, the velocity 3-vector ofS relative to ˆS is:

�ˆ

u=uˆ¹e₁+uˆ²e₂+uˆ³e₃, where uˆⁱ= Λⁱ₄

Λ⁴₄ =−Λ_i⁴

Λ₄⁴, i=1,2,3.

Next we observe that ∑³i=1�_∆x^∆x⁴ⁱ�² = �Λ⁴₄�⁻²∑³i=1�Λ⁴_i�² = �Λ⁴₄�⁻²��Λ⁴₄�²−1�. Similarly,

∑³i=1�_∆ˆ^∆ˆ_x^x⁴ⁱ�²=�Λ⁴₄�⁻²��Λ⁴₄�²−1�.Physically, we interpret these equalities as asserting that the velocity of ˆS relative to S and the velocity of S relative to ˆS have the same constant magnitude which we shall denote byβ. Thus β²=1−�Λ⁴₄�⁻², in particular, 0≤β² <1, and β=0 if and only if Λ is a rotation. Solving for Λ⁴₄ (assuming orthochronous) yields

Λ⁴₄=Λ₄⁴=�1−β²�^−1�2=γ^−1�2. (7) Definition 2.48 (Direction 3-vector, direction cosine). Assuming that Λ is not a rotation one can define the direction 3-vectord�of Sˆrelative to S by:

�

u=βd�=β(d¹e₁+d²e₂+d³e₃), dⁱ=uⁱ�β (8) where dⁱ is the direction cosines of the direction line segment along which the observer in S sees moving. Similarly the direction 4-vector (and direction cosines) of S relative to Sˆ is defined by:

�ˆ

u=βd�ˆ=β(dˆ¹eˆ₁+dˆ²ˆe₂+dˆ³eˆ₃), dˆⁱ =uˆⁱ�β.

(19)

Comparing (6) and (8) and using (7) we obtain

Λ₄ⁱ=−Λ⁴_i=β(1−β²)⁻¹^�²dⁱ, i=1,2,3 (9) and similarly

Λⁱ₄=−Λ_i⁴=β(1−β²)⁻¹^�²dˆⁱ, i=1,2,3 (10) Equations (7), (9) and (10) give the last row and column of Λ in terms of physically measurable quantities. We obtain from (2)

∆ˆx⁴=−βγ(d¹∆x¹+d²∆x²+d³∆x³)+γ∆x⁴

for any two events. A special case of two events on the curve of a point in rest in S ∆x¹ =

∆x² =∆x³=0) gives

∆ˆx⁴=γ∆x⁴ = 1

�1−β²∆x⁴.

In particular, ∆ˆx⁴=∆x⁴ if and only if Λ is a rotation. Any relative motion of S and ˆS gives rise to atime dilation eﬀect according to the relation ∆ˆx⁴>∆x⁴.

Another special case is also interesting, namely when two events are simultaneous in S, i.e., ∆x⁴ =0. Then

∆ˆx⁴=−βγ(d¹∆x¹+d²∆x²+d³∆x³).

Assuming that β ≠ 0 gives, in general, that ∆ˆx⁴ ≠ 0, meaning that two events are not simultaneous in ˆS. The only way they will agree on the simultaneity is if and only if the spatial locations of the events have a very special relation in the direction along which ˆS is moving, namely,

d¹∆x¹+d²∆x²+d³∆x³=0.

It is called therelativity of simultaneity.

2.6.2 Special Lorentz Transformation and Boots

Looking at a subgroup of the Lorentz group L, which direction cosines are given by d¹ = 1,dˆ¹ =−1 and d² =dˆ² = d³ =dˆ³ =0, the direction vectors be d�=e₁ and d�ˆ=−eˆ₁. This corresponds to the situation where the observerS sees ˆS moving in the positive x¹-direction, and ˆS seesS moving in the negative ˆx¹-direction. The origin of both systems will coincide at x⁴=xˆ⁴=0, and two of three spatial coordinates will be the same in both frames of reference.

Now, from the eq. (7), (9) and (10) we find that this Lorentz transformation matrix Λ must have the form

Λ=

��

�

Λ¹₁ Λ¹₂ Λ¹₃ −βγ Λ²₁ Λ²₂ Λ²₃ 0 Λ³₁ Λ³₂ Λ³₃ 0

−βγ 0 0 γ

��

� ,

(20)

and with the use of the orthogonality conditions (1), Λ must take the form

Λ=

��

�

γ 0 0 −βγ

0 Λ²₂ Λ²₃ 0 0 Λ³₂ Λ³₃ 0

−βγ 0 0 γ

��

� ,

where�Λⁱ_j�_i,j₌_2,3 is a(2×2)-matrix, is a rotation on the planeR².

Definition 2.49 (Special Lorentz Transformation). Any Lorentz transformation with Λ²₄= Λ⁴₄=Λ⁴₂=Λ⁴₃=0and�Λⁱ_j�_i,j=2,3 equal to a(2×2)identity matrix is called a special Lorentz transformation. In matrix form it is written as

Λ=

��

�

γ 0 0 −βγ

0 1 0 0

0 0 1 0

−βγ 0 0 γ

��

�

, (11)

with the associated coordinate transformation, ˆ

x¹=(1−β²)⁻¹^�²x¹−β(1−β)⁻¹^�²x⁴, ˆ

x²=x², ˆ x³=x³, ˆ

x⁴=−β(1−β)^−1�2x¹+(1−β)^−1�2x⁴.

(12)

The inverse is

Λ⁻¹ =

��

�

γ 0 0 βγ

0 1 0 0

0 0 1 0

βγ 0 0 γ

��

� .

and the corresponding coordinate transformation is

x¹=(1−β²)^−1�2xˆ¹+β(1−β)^−1�2xˆ⁴, x²=x²,

x³=x³,

x⁴=β(1−β)⁻¹^�²xˆ¹+(1−β)⁻¹^�²xˆ⁴.

The special Lorentz transformation allows −1< β <1. By choosing β >0 when Λ¹₄ <0 andβ<0 when Λ¹₄>0 all special Lorentz transformations can be written in the form of (11).

Definition 2.50 (Boost). For each real number β with −1 < β < 1 we define γ = γ(β) = (1−β²)^−1�2 and

Λ(β)=

��

�

γ 0 0 −βγ

0 1 0 0

0 0 1 0

−βγ 0 0 γ

��

� .

The matrixΛ(β) is called a boost in the x¹-direction.

(21)

The composition of two boosts in the x¹-direction is another boost in the x¹-direction.

Since Λ⁻¹(β)=Λ(−β)the collection of such special Lorentz transformations (boosts) forms a subgroup of the Lorentz groupL.

Remark that the composition of two boost in two diﬀerent directions, is in general, not equivalent to a single boost in any direction.

Suppose now that −1<β₁≤β₂<1, then

� β₁+β₂

1+β1β2�<1, (13)

and

Λ(β₁)Λ(β₂)=Λ� β₁+β₂

1+β₁β₂�. (14)

Both Eq. (13) and Eq. (14) are proven in the Appendix, Exercise 1.3.14.

The physical interpretation is the following: if the speed of ˆS relative to S is β₁ and the speed ofSˆˆrelative to ˆS isβ2, then the speed of Sˆˆrelative toS is not β1+β2, but rather

β₁+β₂ 1+β₁β₂,

which is always less then β₁+β₂, unless β₁β₂ = 0. Equation (14) is called the relativistic addition of velocities formula[1] page 29,[2].

Even if the velocities are not additive directly, one can define anvelocity parameter θthat is additive. Such that if the speed of ˆS relative to S is θ1 and the speed of Sˆˆrelative to ˆS is θ₂, then the speed of Sˆˆrelative toS isθ₁+θ₂. The parameterβ is then a one-to-one relation withθ,β=f(θ). Additivity and (14) require that f satisfies the functional equation

f(θ₁+θ₂)=f� f(θ₁)+f θ₂) 1+f(θ₁)f(θ₂)�.

This formula reminiscent of the sum formula for the hyperbolic tangent, making the change of variable

β=tanh(θ) or θ=tanh⁻¹β.

The hyperbolic form of the Lorentz transformation Λ(β) [1] page 30,[2] (proven in the Ap- pendix, Exercise 1.3.16) is

L(θ)=

��

�

cosh(θ) 0 0 −sinh(θ)

0 1 0 0

0 0 1 0

−sinh(θ) 0 0 cosh(θ)

��

� .

The boots and the rotations as a subgroups are buildings blocks of the Lorentz group, that is expressed in the following theorem.

(22)

Theorem 2.51 ([1] page 30). Let Λ = �Λ^a_b�a,b=1,2,3,4 be a proper, orthochronous Lorentz transformation. Then there exist a real number θ and two rotations R1, R2 ∈ R such that Λ=R₁L(θ)R₂.

The physical interpretation of Theorem (2.51) is that a Lorentz transformation from S to ˆS can be accomplished by (1) rotating the relative motion of S with ˆS, (such that the positivex¹-directions coincide), (2) boosting to corresponding speed as ˆS (relative to S), (3) rotating the spatial axes until it coincides with those of ˆS.

3 Particles and Electromagnetic Fields

In this chapter we shall investigate how particles and electromagnetic fields are described in the Minkowski space. The first section defines general properties of particles. In the second section we see how a charge of the particle, defines a electromagnetic and a magnetic field in the Minkowski space.

3.1 Particles

All the definitions has been taken from Naber [1] pages 87-91.

Definition 3.1(Material particle, proper mass). A material particle inM is a pair (α, m), where α ∶I →M is a timelike curve parametrized by proper time τ and m is a positive real number called the particle proper mass.

One interprets the curve α as the trajectory of the particle.

Definition 3.2(Free material particle). A free material particle is a material particle where α has the formα(τ)=x₀+τ U for some fixed event x₀ and unit timelike world velocity vector U(τ).

Definition 3.3(World Momentum). The world momentum (or the energy-momentum) of a material particle(α, m) is denoted by P and is defined by

P =P(τ)=mU(τ). Notice thatP⋅P =−m².

Definition 3.4 (Relative 3-momentum). The world momentum in an arbitrary admissible basis {e_a} is P =P^ae_a, or in other notation

P =(P¹, P², P³, P⁴)=mγ(�µ,1)=(�p, mγ),

wherep�=(P¹, P², P³)is called the relative 3-momentum. The magnitude of the 3-momentum is given by its Euclidean norm ��p�²=�P¹�²+�P²�²+�P³�².

The quantitymγ= ^�₁^m₋_β2 is sometimes referred as the ”relativistic mass” of(α, m)relative to{e_a}.

The Geometry behind the Special Relativity Theory