A fast and efficient algorithm to compute BPX- and overlapping preconditioner for adaptive 3D-FEM

(1)

Tino Eibner

A fast and efficient algorithm to compute

BPX- and overlapping preconditioner for

adaptive 3D-FEM

CSC/08-02

(2)

Impressum:

Chemnitz Scientific Computing Preprints — ISSN 1864-0087 (1995–2005: Preprintreihe des Chemnitzer SFB393)

Herausgeber: Professuren f¨ur

Numerische und Angewandte Mathematik an der Fakult¨at f¨ur Mathematik

der Technischen Universit¨at Chemnitz

Postanschrift:

TU Chemnitz, Fakult¨at f¨ur Mathematik 09107 Chemnitz

Sitz:

(3)

Chemnitz Scientific Computing

Preprints

Tino Eibner

A fast and efficient algorithm to compute

BPX- and overlapping preconditioner for

adaptive 3D-FEM

CSC/08-02

Abstract

In this paper we consider the well-known BPX-preconditioner in con-junction with adaptive FEM. We present an algorithm which enables us to compute the preconditioner with optimal complexity by a total of only O(DoF) additional memory. Furthermore, we show how to com-bine the BPX-preconditioner with an overlapping Additive-Schwarz-preconditioner to obtain a Additive-Schwarz-preconditioner for finite element spaces with arbitrary polynomial degree distributions. Numerical examples illus-trate the efficiency of the algorithms.

(4)

(5)

1 Introduction

Adaptive FEM is one of the most powerful tool to solve boundary value problems. Whether we consider low order adaptive h-FEM [AO00, BS01, BR03, Ver96] or adaptive hp-FEM [AS97,AS98,AS99,Eib06,HS05,MW01], in both cases we ob-tain approximate solutions of high accuracy for a minimum of degrees of freedom. However, in order to solve the arrising system of linear equations efficiently we have to apply good preconditioners. In the last twenty years a lot of precondition-ers leading to optimal condition number O(1) or almost optimal condition num-bers have been developed, e.g. see [BPX91,EM07,HKK05,KL04,Nep86,TW05]. In this paper we will not develop a new preconditioner. Instead we consider the well-known BPX-preconditioner [BPX90,Xu90,Xu92, Zha92] and present a fast and efficent algorithm to compute this preconditioner in the case of adap-tive FEM. Moreover, since the applicability of the BPX-preconditioner is re-stricted to finite element spaces based on piecewise linear functions, we consider the combination of BPX- with an overlapping Additive-Schwarz-preconditioner (see [Osw94, EM07, Eib06, SMPZ08]) to overcome this restriction and obtain a preconditioner for adaptive FEM with arbitrary polynomial degree distribution. For simplicity of exposition, we will restrict our attention to scalar reaction-diffusion equations in 3D and adaptive h-FEM on tetrahedral meshes without hanging nodes.

The key of our implementation is a properly chosen data-structure which enables us to compute the BPX-preconditioner with optimal complexity and a total of only O(DoF) additional memory. To our knowledge, this is a new approach to compute the BPX-preconditioner for an adaptive FEM scenario and we want to emphasise that our implementation can easily be adapted to adaptive hp-FEM and other types of equations such as, for example, elasticity equations. Furthermore, our algorithm does not depend on the dimension of the domain, that is it can be applied to adaptive 2D-FEM on triangular meshes without changes. We also expect that our algorithm can be extented to hexahedral meshes as well as to meshes containing hanging nodes.

2 Model problem and FE-discretization

Let Ω ⊂ R3_{be a Lipschitz domain and assume that its boundary Γ = ∂Ω consists} of two disjoint portions,

Γ = ΓD ∪ ΓN,

where ΓD and ΓN consist of a finite number of disjoint parts of positive measure, or they are empty, respectively. Let A = [aij]3i,j=1 be a coefficient matrix and

(6)

a0 a scalar valued function. Denote by fr the right hand side function and by gN the boundary conditions on ΓN. Then the classical formulation of our model problem reads as follows:

Problem 2.1 (model problem - classical formulation). Find a solution u of −∇ · (A∇u) + a0u = fr in Ω,

u = 0 on ΓD, hA∇u, ηi = gN on ΓN.

Since the more appropriated formulation of this problem is the so called weak formulation, we write

H_D1(Ω) := {u ∈ H1(Ω) | u|ΓD = 0}

and consider henceforward the Galerkin formulation of Problem 2.1. Problem 2.2 (model problem - weak formulation). Find u ∈ H1

D(Ω) such that Z

Ω

∇u · A∇v + a0uvdΩ = Z Ω frvdΩ + Z ΓN gNvdΓ ∀ v ∈ HD1(Ω).

Throughout the paper we demand A = A(x) to be uniformly symmetric positive definite for x ∈ Ω. We demand aij, a0 ∈ L∞(Ω) with a0 ≥ α0 > 0 in the case of ΓD = ∅ and a0 ≥ 0 otherwise. Finally we claim gN ∈ L2(ΓN) and fr ∈ L2(Ω). Due to these assumptions the bilinear form a : H_D1(Ω) × H_D1_{(Ω) 7→ R, given by}

a(u, v) := Z

Ω

∇u · A∇v + a0uvdΩ,

is symmetric, continuous and coercive and the functional f : H1

D(Ω) 7→ R with f (v) := Z Ω frvdΩ + Z ΓN gNvdΓ

belongs to the dual space [H_D1(Ω)]∗. Thus the assumptions of Lax-Milgram [Bra97,Sch98] are satisfied which gives us the existence and uniqeness of a solu-tion for Problem 2.2.

However, in order to solve Problem 2.2 numerically, we have to discretize it. To that end we cover the domain Ω with an admissible shape-regular FE-mesh T consisting of tetrahedra (see, e.g. [Cia76]) and choose a polynomial degree p. On the reference tetrahedron ˆK we denote with Pp( ˆK) the space of polynomials of

(7)

total degree less or equal p and with FK : bK → K we denote the affine linear element map of K ∈ T to ˆK. Then we define finite element spaces

Sp(Ω, T ) := {u ∈ H1(Ω) | u|K◦ FK ∈ Pp( ˆK) ∀K ∈ T }, S_Dp(Ω, T ) := Sp(Ω, T ) ∩ H_D1(Ω)

and replace the infinite dimensional space H1

D(Ω) by V := S p

D(Ω, T ) in Prob-lem (2.2). This gives the finite dimensional problem

Problem 2.3 (Discretized problem). Find u ∈ V such that a(u, v) = f (v) ∀ v ∈ V. Now, if we define an Operator A : V 7→ V∗ via

[Au](·) := a(u, ·)

Problem2.3 becomes equivalent to a finite dimensional operator equation in V∗. Problem 2.4 (Operator equation). Find u ∈ V such that

Au = f. (1)

3 PCG-algorithm

Since the operator A with [Au](v) := a(u, v) is related to a symmetric posi-tive definite bilinear form a(·, ·) we can solve Problem (2.4) by the well-known PCG-algorithm. At this point one usually introduces a basis for the FE-space V (see Remark (3.2) below). Thus, V and the dual space V∗ equiped with the corresponding dual basis become equivalent to RN and the PCG-algorithm can be considered in RN _{using matrix-vector operations. However, for this paper it is} more convenient to postpone the definition of a basis and to consider the PCG-algorithm in a slightly more abstract way, namely as an PCG-algorithm which operates in the FE-space V and its dual space V∗. Before we formulate the algorithm, we define for given u ∈ V a corresponding residuum functional ru ∈ V∗ via

ru := Au − f.

and omit the index u whenever it is clear from the context. Algorithm 3.1 (PCG).

• Choose an initial guess u ∈ V and compute its residuum functional r ∈ V∗_. • Apply a preconditioner C−1 _{: V}∗ _{7→ V:}

(8)

• Compute γ = r(w). • While not converged do

1. v = Aq ∈ V∗ 2. α = −γ/v(q) _{∈ R} 3. u = u + αq ∈ V 4. r = r + αv ∈ V∗ 5. w = C−1_r _{∈ V} 6. γˆ = r(w) _{∈ R} 7. q = w + (ˆγ/γ)q ∈ V 8. γ = ˆγ _{∈ R}

An important issue concerning an iterative solver is its convergence rate. In the case of the PCG-Algorithm (3.1) this convergence rate is strongly related to the precondioner C−1. If we define the energy norm k · kA and the condition number κ of C−1A via

kwk2_A:= a(w, w) and κ := λmax(C −1_A) λmin(C−1A) ,

then it is a well-known fact that with the initial guess u0 _{and the exact solution} u∗ of Au = f the i-th initerate ui of the PCG-algorithm (3.1) satisfies

kui − u∗kA≤ 2 √κ − 1 √ κ + 1 i ku0− u∗kA.

Remark 3.2. Later we introduce a basis Φ = [φ1, . . . , φN] for our FE-space V and the dual basis Ψ = [ψ1, . . . , ψN] for V∗. With such bases each u ∈ V becomes equivalent to a vector u ∈ RN u = N X i=1 u_iφi ∈ V ⇔ u = (u1, . . . , uN)T ∈ RN and, moreover, each f ∈ V∗ _{becomes equivalent to a vector f ∈ R}N

f = N X i=1 f iψi ∈ V ∗ _⇔ f = (f 1, . . . , fN) T _{= (f (φ} 1), . . . , f (φN))T ∈ RN. Hence, since the matrix representation of the operator A is a matrix whose k-th column is the vector representations of Aφk ∈ V∗ with respect to Ψ, we obtain

A : V 7→ V∗ ⇔ A =    a(φ1, φ1) . . . a(φN, φ1) .. . ... a(φ1, φN) . . . a(φN, φN)   ∈ R N,N_.

With these matrix and vector representations we obtain the equivalences v = Aq ⇔ v = Aq, γ = r(w) ⇔ γ = wTr, ... and we end up with the well-known RN-version of the PCG-algorithmn.

(9)

4 Adaptive FEM

Considering Problem2.2 it is not common practice to start with a highly refined mesh to compute an approximate solution. Instead we start on a coarse grid with as few nodes as possible. We compute the corresponding FE-solution and, thereafter, in order to improve the accuracy of the approximation, we employ an a-posteriori error estimator which gives us information about the error dis-tribution and flags all elements with large local error. Next we refine all flagged elements and obtain a finer mesh. We compute a new approximation of higher accuracy, estimate the error for this new approximation and in case the approxi-mation is not sufficently accurate a new iteration of the adaptive loop is begun. Thus, we obtain a sequence of refined meshes together with FE-solutions of in-creasing accuracy. That is we have

coarse grid (0) -stiff. matrix (1) -FE-solution (2) -error est. (3) -refined mesh (4) ?

In this paper we focus on tetrahedral meshes without hanging nodes. A suited algorithm for local mesh refinement which avoids hanging nodes can be found in [AMP00].

5 Additive-Schwarz-Method(ASM)

As we have already seen in Section3, the key to solve the arrising system of linear equations with only a few PCG-iterations is the definition of a good precondi-tioner C. The precondiprecondi-tioners which we will consider are the BPX-precondiprecondi-tioner for the piecewise linear functions of S1(Ω, T ) and an overlapping preconditioner for the higher polynomial degrees of Sp_{(Ω, T ). Both preconditioners can be} con-sidered as members of the class of Additive-Schwarz-preconditioner which means they are completely determined by a not necessarily direct decomposition

V := S_Dp(Ω, T ) = K X

i=0 Vi

of our finite element space V into subspaces Vi. The action of such a precon-ditioner is best described by the realization of C−1r, where r ∈ V∗ denotes the residuum.

Definition 5.1 (ASM-preconditioner). For each subspace Vi find wi ∈ Vi such that

(10)

Then C−1r := K X i=0 wi.

We see, the computation of C−1r is equivalent to solve local versions of the orig-inal problem for all subspaces Vi. However, these subspaces are ordinarily low-dimensional in contrast to the FE-space V.

In order to analyze ASM-preconditioners we define C2

0 such that min ( _K X i=0 a(ui, ui) u = K X i=0 ui, ui ∈ Vi ) ≤ C2 0a(u, u) ∀ u ∈ V (2) and denote with ρ(E) the spectral radius of the matrix E = [eij]Ki,j=1 with entries

eij = sup u∈Vi sup v∈Vj |a(u, v)| pa(u, u)pa(v, v) ∈ [0, 1], (3)

which are the angles between the subspaces V1, ..., VK. With these settings we are able to derive a bound for the condition number of C−1A.

Theorem 5.2.

κ(C−1A) ≤ C2

0(1 + ρ(E)). Proof. See Section 10.

6 MDS-BPX

In the following two sections we will introduce the Multilevel-Diagonal-Scaling-BPX (MDS-Multilevel-Diagonal-Scaling-BPX) as an additive Schwarz preconditioner for V = S1

D(Ω, T ) and describe its implementation in detail.

To that end we assume that we are within an adaptive FE-process (see Section4) which started on a coarse grid T0 and produced a sequence of nested tetrahedral meshes

T0 ⊂ T1 ⊂ . . . ⊂ TM = T . We define

Vm := Set of all free vertices on Tm

:= Set of all vertices of Tm which are not located on ΓD, and we denote with φm

(11)

We define one-dimensional subspaces Vm

v := spanφ m v .

and equip the finite element space V with the basis {φM_v | v ∈ VM_}.

Then, in the context of ASM, the MDS-BPX-preconditioner is defined by the decomposition V = M X m=0 X v∈Vm ∗ V_vm, (4)

where the “∗” indicates that the summation is only over pairwise distinct spaces Vm

v . For {Tm}Mm=0 arising from a total or a boundary concentraded mesh refine-ment it is shown that the BPX-preconditioner yields

κ(C−1A) = O(1) for M → ∞, (5)

independent on the dimension of Ω (see [Eib06, Xu90, Xu92, Zha92]) and for adaptive mesh refinements all numerical examples indicate that (5) is valid too. Due to Definition 5.1 of general ASM-preconditioners, the finite element space decomposition (4) and the fact that all subspaces Vm

v are one-dimensional spaces, we compute for a given residuum r ∈ V∗ the BPX-preconditioner as

C−1r = M X m=0 X v∈Vm ∗ _r(φm v ) a(φm v , φmv ) φm_v . (6)

To illustrate Splitting (4) we want to close this section with a one-dimensional example.

Example 6.1. For Ω = (0, 1), ΓD = ∅, given bilinear form a(·, ·), and given right hand side functional f we want to solve Problem 2.2. Assume we started with a coarse grid T0 and that our adaptive FE-algorithm produced a sequence of meshes T0, ..., T3 as in shown in Figure 1 so far.

According to (4) Example 6.1 induces the splitting V = S1_{(Ω, T} 3) = (V10+ V 0 2) + (V 1 1 + V 1 2 + V 1 3) + (V 2 1 + V 2 3 + V 2 4) + (V 3 1 + V 3 4 + V 3 5). That is, we start with adding up all subspaces of the coarse grid and thereafter, since no subspace of T1 is equivalent to V10 or V20, we have to add all subspaces of T1. When we come to the subspaces of T2 we only add V12, V32, V42 and omit V22 since this subspace is equivalent to V1

2 and hence already a part of our splitting. Finally we consider the subspaces of T3 and it remains to add V13, V43, V53.

(12)

Figure 1: 1D-example of a sequence of adaptively refined meshes 1 1 2 2 2 3 3 φ₁(3) φ₁(0) φ₁(2) φ 1 (1) φ₄(2) _φ 3 (2) φ₃(1) φ 2 (2) φ 2 (1) φ₂(0) φ 2 (3) φ 3 (3) φ 4 (3) φ₅(3) T 0: T 1: T 2: T 3: 1 5 4 3 2 4 1 1 2 2 2 3 3 φ₁(3) φ₁(0) φ₁(2) φ 1 (1) φ₄(2) _φ 3 (2) φ₃(1) _φ 2 (1) φ₂(0) φ 4 (3) φ₅(3) φ 3 (3) _φ 2 (3) φ 2 (2) T 0: T 1: T 2: T 3: 1 3 2 4 5 1 4 1

Remark 6.2. We number the nodes of TM consecutively from 1 to NM such that if Nm denotes the number of nodes of Tm the nodes of Tm correspond to the numbers 1, . . . , Nm for m = 0 . . . M . Thus, due to this isomorphism between the nodes of Tm and the numbers {1, . . . , Nm} we also write φmi instead of φmv for a vertex v with number i. Similarly we may write Vm

i for Vvm.

7 Implementation of the MDS-BPX

Now, after we have seen the definition of the BPX-preconditioner, the remaining question is, how to evaluate the double sum (6) fast and efficiently, which means with at most O(NM) work and at most O(NM) additional memory, independent on the number of mesh refinements.

The key of our implementation is a properly chosen data-structure the so called extended vertex tree table or short EVT-table, which will be introduced in the following subsection.

7.1 Preliminary work

In order to compute C−1r for a given residuum r ∈ V∗ we need r(φm v ) and a(φm_v , φm_v ) for several but not all basis functions φm_v .

Since the summation (4) is only over pairwise distinct spaces V_vm we observe, that starting with all subspaces V0

v for v ∈ V0 we have to add for m = 1, . . . , M exactly all those subspaces Vm

v which correspond to a vertex v, newly created on Tm, or one of its parents. That is, for fixed Tm we need r(φmv ) and a(φmv , φmv ) for all the basis functions φm

v , where v is a newly created vertex or a parent of such a vertex.

(13)

Remark 7.1. Newly created vertex with respect to Tm means that the vertex is created within the refinment process Tm−1 7→ Tm. In the case m = 0 all vertices of T0 are called newly created.

Thus, in order to provide all the required r(φm

v ) and a(φmv , φmv ), we generate an EVT-table synchronously to the adaptive finite element process. Our EVT-table contains one row for each node and each of these rows stores two integers and six real numbers. To be more precisely let us consider the node with number i and assume that this node appears first on mesh Tm. Furthermore we assume that its parents have the numbers p1 and p2. Then the i-th row of our table provides storage space for

r(φm_i ) a(φm_i , φ_im) p1 r(φmp1) a(φ m p1, φ m p1) p2 r(φ m p2) a(φ m p2, φ m p2)

Since the nodes of T0 have no parents the last columns of the first N0 rows remain empty.

All but the r(φ∗_∗) cells have to be initialized once and thereafter its contents will never change again. Thus we call the set of cells which do not contain an r(φ∗_∗) entry static cells and the prelimiary work is to initialize the static cells of our EVT-table.

The static cells of the first N0 rows are initialized directly after setting up the stiffness matrix for the initial mesh T0. For v ∈ V0 we only have to copy a(φ0v, φ0v) from the diagonal of the stiffness matrix into our EVT-table. For v 6∈ V0 _{we set} a(φ0

v, φ0v) := 1. Thereafter the mesh refinement procedure has to record all newly created nodes together with the numbers of its parents and after setting up the stiffness matrix for the refined mesh we run through the rows of our EVT-table associated with the newly created nodes and copy the required entries a(·, ·) from the diagonal of the stiffness matrix into our table or set a(φm_v , φm_v ) := 1 for v 6∈ Vm_{, respectively.}

Considering Example6.1 this yields to:

Table 1: EVT-table for Example 6.1.

N Father1 Father2

T0 1 * a(φ01, φ01) * * * * * * 2 * a(φ0₂, φ0₂) * * * * * * T1 3 * a(φ13, φ13) 1 * a(φ11, φ11) 2 * a(φ12, φ12) T2 4 * a(φ24, φ24) 1 * a(φ21, φ21) 3 * a(φ23, φ23) T3 5 * a(φ35, φ35) 1 * a(φ31, φ31) 4 * a(φ34, φ34)

(Note that only the cells with a white background color are part of our data structure.)

(14)

7.2 Computation of C

−1

r - Step I

To compute C−1r for a given residuum r ∈ V∗ we assume that the preliminary work is completed, that is we have a fully initialized EVT-table, such as Table7.1, at hand. Then the first step to compute C−1r is to fill all the r(φ∗_∗) cells of our EVT-table. Since we use the standard nodal basis for V and its dual basis for V∗ _{the residuum r is given via a vector [r(φ}M

v )]v∈VM (see Remark 3.2). We also

need r(φm_i ) for m 6= M . These values are not given directly but since we know the refinement history, we can compute them recursively from rM_{. At this point} and in order to simplify notation we define the extended version of the residual vector

rm =rm₁ , . . . , rm_N_m with rm_i := r(φ m

i ) : if vertex i belongs to Vm 0 : otherwise

and we assume that our finite element code provides the extended rM _{rather than} the compact version which omits the entries r(φm

v ) for v 6∈ Vm.

Considering Example 6.1 the T3 section of the corresponding EVT-table 7.1tells us: φ2₁ = φ3₁+1 2φ 3 5, φ 2 4 = φ 3 4+ 1 2φ 3 5, and φ 2 i = φ 3 i for i 6∈ {1, 4}. Thus we have r φ2₁ = r φ3₁ +1 2r φ 3 5 , r φ24 = r φ34 + 1 2r φ 3 5 , and r φ2_i = r φ3_i for i 6∈ {1, 4}.

Generally spoken, in order to obtain rm−1 _{from r}m _{we go through the rows i =} Nm−1 + 1, . . . , Nm of our EVT-table and compute for the parents p1 and p2 of vertex i

r_p1+ = 1

2ri if the vertex with number p1 belongs to V

m−1 _and (7) r_p2+ = 1

2ri if the vertex with number p2 belongs to V m−1_.

These operations are performed in place, directly on rm without any auxiliary vector and finally the first Nm−1 positions of the input vector rm contain the entries of rm−1_.

That is, given an initialized EVT-table and the extended residuum vector rM_, we fill all r(φm

(15)

Algorithm 7.2. for m = M, . . . , 0 {

for i = Nm−1+ 1, . . . , Nm

copy r(φm_i ), r(φm_p₁), r(φm_p₂) from rm into the i-th row of our EVT-table if m > 0

compute rm−1 from rm }

Considering Example6.1 once again, we arrive at:

N Father1 Father2

T0 1 r(φ01) a(φ01, φ01) * * * * * * 2 r(φ0

2) a(φ02, φ02) * * * * * *

T1 3 r(φ13) a(φ31, φ13) 1 r(φ11) a(φ11, φ11) 2 r(φ21) a(φ11, φ11) T2 4 r(φ24) a(φ42, φ24) 1 r(φ21) a(φ21, φ21) 3 r(φ32) a(φ21, φ21) T3 5 r(φ35) a(φ53, φ35) 1 r(φ31) a(φ31, φ31) 4 r(φ43) a(φ34, φ34)

7.3 Computation of C

−1

r - Step II

Now with a complete EVT-table at hand we want to add up all ˆ wm_v := r(φm v ) a(φm v , φmv ) φm_v

of sum (6). To that end we start with w = 0. The variable m goes step-by-step from 0 to M and in each step we add all the required contributions ˆwm_v of level m to w. Obviously, the intermediate result

wk := k X m=0 X v∈Vm ∗ ˆ w_vm

is an element of S1(Ω, Tk) and thus its most appropriate representation is the vector wk _{= [w}k 1, . . . , wkNk] with wk = Nk X i=1 wk_iφk_i.

That is, we start with w = [0, 0, ..., 0], the vector representation of w = 0 with respect to the basis {φ0

1, . . . , φ1N0} of S

1_{(Ω, T}

(16)

Algorithm 7.3. for m = 0, . . . , M

(0) Clear all flags. (1) Add P

v∈Vm

∗ ˆ

w_vm to the current w given as w = [wm₁ , . . . , wm_N_m] with

w = Nm X i=1 wm_i φm_i . (2) If m < M

decompose w after the S1(Ω, Tm+1) basis such that

w = Nm+1 X i=1 wm+1_i φm+1_i and set w = [wm+1₁ , . . . , wm+1_N m+1].

Let us consider parts (1) and (2) of Algorithm 7.3 in detail.

In part (1), all we have to do is to go through the rows i = Nm−1+ 1, ..., Nm of our EVT-table. From each row we read the numbers p1 and p2 of the parents of vertex i, compute

wm_i + = r(φm_i )/a(φm_i , φm_i ) if vertex i belongs to Vm ,

wm_p₁+ = r(φm_p₁)/a(φm_p₁, φm_p₁) if vertex p1 is not flagged and belongs to Vm, wm_p₂+ = r(φm_p₂)/a(φm_p₂, φm_p₂) if vertex p2 is not flagged and belongs to Vm,

and finally flag the vertices p1 and p2. All required data can be found in the i-th row of our EVT-table, the flags are necessary to ensure that we add the r(φm

pk)/a(φ

m pk, φ

m

pk) terms only once, see Remark 7.5.

In part (2) we convert the current vector representation of w with respect to the S1(Ω, Tm) basis into a vector representation of w with respect to the S1(Ω, Tm+1) basis. To do so we have to extent the vector representation [wm

1 , . . . , wmNm] to [wm+1₁ , . . . , wm+1_N m+1] with wm+1_i := wm_i for i ≤ Nm 1 2(w m p1 + w m p2) for i ∈ {Nm+ 1, . . . , Nm+1} ,

where p1, p2 are the parents of vertex i. Now we have the desired vector rep-resentation of w with respect to the S1(Ω, Tm+1) basis and once again, all the required data could be obtained directly from the i = Nm+ 1, . . . , Nm+1 rows of our EVT-table.

(17)

Remark 7.4. The extension of w to length Nm+1 is mere formality. In general we start with a vector w of length NM from the very beginning and an extension as above is simply a change of the upper array index.

Remark 7.5. Consider Example 6.1 and assume that we obtain a mesh T4 by subdividing the elements (1, 5) and (4, 5) of T3. That is we get

3 2

4 5

1 6 7

T₄:

and our EVT-table looks like

N Father1 Father2

..

. ... ... ... ...

T4 6 r(φ46) a(φ64, φ46) 1 r(φ41) a(φ41, φ41) 5 r(φ54) a(φ45, φ45) 7 r(φ4

7) a(φ47, φ47) 4 r(φ44) a(φ44, φ44) 5 r(φ54) a(φ45, φ45) Now, adding all w_v4 contributions of (6) means to perform

w+ = X i∈I r(φ5 i) a(φ5 i, φ5i) φ5_i with I = {1, 4, 5, 6, 7}.

However, if we went through the rows of the T4 section of our EVt-table the vertex with number 5 appears twice as a parent and consequently we would add

r(φ5 5) a(φ5 5, φ55) φ5₅

twice. The easiest way to avoid such an undesired doubling is to use a flag for each node.

8 Numerical example

In this section we present some numerical examples to demonstrate the efficiency of our BPX-implementation. All examples start with a coarse grid T0 of the given domain and then we create a sequence of nested meshes by applying the mesh refinement algorithm of [AMP00]. This algorithm is based on the bisection of tetrahedra and allows us to construct locally refined meshes without hanging nodes.

Example 8.1. Let Ω = (0, 1)3, ΓD = (0, 1) × (0, 1) × {0, 1} and ΓN = ∂Ω\ΓD. Let {Ti}Mi=0 be a sequence of nested meshes on Ω. Find u ∈ V := SD1(Ω, TM) such that Z Ω ∇u · ∇v + uvdΩ = Z Ω frvdΩ ∀ v ∈ V, where the right hand side is given by fr = 1 +

P3 i=1x

2 i.

(18)

Table 2: Total mesh refinement, C2

res = 10 −6

M Elements DoF It Total[s] Precond[s] Ratio

0 6 8 7 6.1e-05 1.8e-05 0.29 1 14 10 8 8.4e-05 2.4e-05 0.28 2 30 17 8 1.2e-04 3.3e-05 0.28 3 70 31 9 2.2e-04 5.4e-05 0.24 4 166 58 11 5.0e-04 1.0e-04 0.20 5 414 120 12 5.1e-04 9.1e-05 0.17 6 990 256 15 1.4e-03 2.1e-04 0.14 7 2.420 554 16 3.5e-03 4.3e-04 0.12 8 5.718 1.251 17 8.8e-03 1.1e-03 0.12 9 13.460 2.768 19 2.6e-02 3.4e-03 0.13 10 30.960 6.172 19 6.1e-02 7.7e-03 0.12 11 70.932 13.617 20 1.5e-01 1.9e-02 0.12 12 158.904 30.809 20 3.6e-01 5.1e-02 0.14 13 355.674 65.614 21 8.5e-01 1.3e-01 0.15 14 782.288 146.532 21 1.9e+00 3.1e-01 0.15 15 1.717.898 317.675 21 4.4e+00 7.0e-01 0.16 16 3.723.834 681.607 22 1.0e+01 1.7e+00 0.16 17 8.067.178 1.466.444 22 2.3e+01 3.8e+00 0.16 18 17.284.898 3.191.462 22 5.4e+01 8.5e+00 0.15 19 37.051.776 6.646.901 22 1.4e+02 2.9e+01 0.21

We consider two different scenarios. At first we consider the case where the mesh Ti+1 is derived from Ti by a total mesh refinement. Thereafter, in a second run, we employ a standard residuum error estimator to refine only those elements with high local error and hence obtain a sequence of locally refined meshes.

All iterations start with an initial guess u = 0 and we use the PCG-algorithm in conjunction with our implementation of the BPX-preconditioner to reduce the initial residual (measured in the Euklidean norm) by a given factor Cres. That is we iterate until r2

i < Cres2 r02.

The results of the experiments are collected in Table 2and Table 3.

The column “Elements” shows the number of tetrahedra and “DoF” denotes the dimension of the finite element space S_D1(Ω, TM). The column “It” gives the number of PCG iterations required to reduce the residual by Cres. “Total” is the computing time of the PCG algorithm including the application of the precondi-tioner and the column “Precond” contains the time to execute the precondiprecondi-tioner. Finally the column “Ratio” gives the quotient of “Total” and “Precond”.

(19)

eval-Table 3: Adaptive mesh refinement, C2

res = 10 −8

M Elements DoF It Total[s] Precond[s] Ratio

1 6 8 8 1.1e-04 2.2e-05 0.20 2 11 9 7 7.7e-05 2.0e-05 0.26 3 16 11 9 9.2e-05 2.7e-05 0.29 4 23 14 9 1.1e-04 3.2e-05 0.29 5 32 18 10 1.4e-04 4.4e-05 0.31 6 46 24 10 9.1e-05 3.0e-05 0.33 7 108 39 12 1.7e-04 4.5e-05 0.26 8 182 58 13 2.7e-04 6.4e-05 0.23 9 298 97 14 4.4e-04 9.2e-05 0.21 10 508 152 16 8.1e-04 1.5e-04 0.18 15 8.040 1.694 22 1.7e-02 2.3e-03 0.13 20 108.055 21.227 26 3.3e-01 4.4e-02 0.13 25 1.209.941 227.229 28 4.1e+00 6.9e-01 0.16 30 12.934.018 2.382.662 28 5.3e+01 9.5e+00 0.17

uation takes only a fraction of the total computing time of the PCG-algorithm. Most computing time is required for the multiplication of the stiffness matrix A with a vector q, where in our implementation this multiplication is done by multiplications of the stored local stiffness matrices AK with subvectors qK of q and assemblation of all the local vectors vK = AKqK.

The memory to store our EVT-table is O(NM), where NM denotes the number of vertices of the current mesh TM and considering our algorithm as well as the results of our numerical experiments we see that the computing time to execute the preconditioner is of order O(NM) too. However, this computing time is to some extent affected by memory cache effects.

9 Higher polynomial degrees

Up to now we considered finite element spaces V = S1_{(Ω, T ). However, often it is} useful to apply Sp_{(Ω, T ) with p > 1 or even a hp-version of FEM, where different} elements may have different polynomial degrees. Thus the question arrises how to construct a preconditioner for such a setting where higher polynomial degrees appear.

As in the previous sections, we assume that we have a sequence of nested meshes T0 ⊂ T1 ⊂ . . . ⊂ TM and we define for v ∈ VM the patch ωMv := ∪{K ∈ TM | v ∈ K} which is the closure of the union of all elements K ∈ TM with v as one of its

(20)

vertices. Moreover we define for each v ∈ VM _{a subspace} Sv = {u ∈ Sp(Ω, TM) | supp u ⊂ ωMv }. Then, in the context of ASM, the splitting

Sp(Ω, TM) = S1(Ω, TM) + X

v∈VM

Sv

defines an overlapping preconditioner with κ(C−1A) = O(1). If we decompose S1_{(Ω, T}

M) further, such that Sp(Ω, T ) = M X m=0 X v∈Vm ∗ Vm v + X v∈VM Sv, we obtain a preconditioner for Sp_{(Ω, T}

M) which can be considered as a combina-tion of BPX- and overlapping precondicombina-tioner. Thus, due to Definicombina-tion (5.1) we get for a given residuum r ∈ V∗ the preconditioned residuum C−1r by using the BPX-routine from the previous sections to compute

M X m=0 X v∈Vm ∗ _r(φm v ) a(φm v , φmv ) φm_v and adding wv ∈ Sv for all v ∈ VM, given by

a(wv, u) = r(u) ∀u ∈ Sv.

That is, after applying the BPX-routine we have to solve a linear system of equations for each subspace Sv. However, these linear systems of equations are low-dimensional and for a fixed subspace Sv, the corresponding system matrix is s.p.d. and always the same. Only right hand sides change.

There are two possible strategies to implement the preconditioner

1. Runtime optimal strategy: We compute a Choleky decomposition once for each system matrix of the Sv-subproblems. We store these Choleky de-compositions and reduce solving the linear systems of equations to forward and backward substitutions. Since the polynomial degree p is fixed and the number of tetrahedra sharing a vertex is bounded by a constant, the dimension of Sv is also bounded by a constant and hence we have:

additional memory = O(NM).

2. Memory optimal strategy: We compute the Choleky decompositions anew whenever we have to solve a Sv-subproblem. Since the dimension of Sv is bounded by a constant we have

(21)

Table 4: Results for Example 9.1 p = 1, C2 res= 10 −5 _{p = 3,} _C2 res = 10 −10

Elements DoF Iterations H1-Err DoF Iterations H1-Err

6 8 2.0e-05 0 5.5e-01 64 7.3e-04 6 4.3e-02

14 10 4.1e-05 1 4.1e-01 105 2.6e-03 15 2.4e-02

30 17 4.0e-05 1 3.3e-01 211 7.2e-03 18 1.3e-02

70 31 8.1e-05 2 2.7e-01 443 2.6e-02 29 1.1e-02

166 58 2.8e-04 5 2.0e-01 950 7.8e-02 40 5.6e-03

414 120 6.4e-04 5 1.5e-01 2.248 2.1e-01 51 3.7e-03

990 256 2.7e-03 8 1.1e-01 5.077 3.0e-01 61 1.9e-03

2.420 554 2.2e-03 9 8.6e-02 11.994 8.1e-01 67 1.1e-03 5.718 1.251 5.2e-03 9 6.6e-02 27.796 2.7e+00 75 6.8e-04 13.460 2.768 1.5e-02 10 5.0e-02 63.978 6.7e+00 81 3.9e-04 30.960 6.172 3.9e-02 11 3.8e-02 145.609 1.5e+01 83 2.4e-04 70.932 13.617 9.3e-02 12 2.9e-02 330.319 3.8e+01 91 1.5e-04 158.904 30.809 2.2e-01 12 2.3e-02 736.750 8.7e+01 92 9.3e-05 355.674 65.614 4.9e-01 12 1.7e-02 1.635.004 2.0e+02 97 6.5e-05 782.288 146.532 1.2e+00 13 1.4e-02 3.594.777 4.7e+02 101 4.8e-05 1.717.898 317.675 2.5e+00 12 1.1e-02 7.864.521 1.2e+03 101 3.5e-05

3.723.834 681.607 6.1e+00 13 8.6e-03 — — — —

8.067.178 1.466.444 1.3e+01 13 6.8e-03 — — — —

17.284.898 3.191.462 3.5e+01 13 5.5e-03 — — — —

37.051.776 6.646.901 8.0e+01 13 4.2e-03 — — — —

Let us consider the following example. Example 9.1. Let Ω = (0, 1)3_{, Γ}

D = ∂Ω. Let {Ti}Mi=0 be a sequence of nested meshes on Ω. For p = 1, 2, 3 find u ∈ Sp_{(Ω, T}

M) with u|∂Ω= cos(x) cos(y) cos(z) such that Z Ω ∇u · ∇v + uvdΩ = 4 Z Ω vdΩ ∀ v ∈ S_Dp(Ω, TM).

We start with u = 0 on a coarse grid and apply a total mesh refinement strategy which gives us a sequence of nested meshes. We use the FE-solution of the current mesh as initial guess for the FE-solution on the refined mesh and iterate until the initial residual r0 is reduced by a given factor Cres. That is we iterate until r2_i < C_res2 r₀2. In the first run we consider the finite element space Sp(Ω, TM) for p = 1 and therafter we compute the runtime optimal strategy for p = 3.

The results of the experiments are collected in Table4.

The column “Elements” shows the number of tetrahedra of the considered mesh and “DoF” denotes the dimension of the corresponding finite element space. The column “Iterations” gives the number of PCG iterations and the computing time

(22)

required to reduce the initial residual by Cres. Finally the column “H1-Err” shows the error kuF EM − uexactkH1_(Ω) of our finite element solution.

Since the exact solution u = cos(x) cos(y) cos(z) of Example 9.1 is a very smooth function, the advantage of using S3_{(Ω, T}

M) instead of S1(Ω, TM) is a much smaller error at equal number of degrees of freedom. However, due to the smaller error, we have to increase C2

res for p = 3 which in turn yields to higher iteration numbers and, moreover, solving all Sv-subproblems takes a lot of computing time even if we apply the runtime optimal strategy. On the other hand we have a good approach for a possible parallelization. These Sv-subproblems are independent local problems, thus, if we compute them in parallel the computing time for higher polynomial degrees will decrease significantly.

10 Proof of Theorem

5.2

In this section we want to give a proof for Theorem 5.2. We make use of the notation of Section 5 and define operators Pi : V 7→ Vi and P : V 7→ V via

a(Piu, v) = a(u, v) ∀v ∈ Vi and P := K X

i=0

Pi. (8)

Thus P = C−1A and we have to derive a bound for κ(P ) = λmax(P )

λmin(P )

, (9)

where λmin(P ) = inf u∈V, u6=0

a(P u, u)

a(u, u) and λmax(P ) =u∈V, u6=0sup

a(P u, u) a(u, u) . Lemma 10.1 (Lemma of Lions). Let C0 be given by (2). Then for all u ∈ V

kuk2 A≤ C

2

0a(P u, u). Proof. Let u ∈ V and choose a representation u = PK

i=0ui such that K X i=0 kuik2A≤ C 2 0kuk 2 A with ui ∈ Vi. (10)

Then, applying (8) and Cauchy-Schwarz-inequality we obtain

kuk2 A= K X i=0 a(u, ui) = K X i=0 a(Piu, ui) ≤ K X i=0 kPiuk2A !1/2 _K X i=0 kuik2A !1/2

(23)

Lemma 10.2. Let ρ(E) be given by (3). Then for all u ∈ V a(P u, u) ≤ (1 + ρ(E))kuk2_A. Proof. We observe that for all u ∈ V and i ∈ {0, .., K}

kPiuk2A= a(Piu, Piu) = a(u, Piu). Hence we have kPiuk2A ≤ kukAkPiukA which is

kPiukA≤ kukA ∀u ∈ V. Next we consider k(P − P0)uk2A= K X i,j=1 a(Piu, Pju) ≤ K X i,j=1 eijkPiukAkPjukA.

Since E = [eij]Ki,j=1 is a symmetric matrix we obtain

k(P − P0)uk2A ≤ ρ(E) K X i=1 kPiuk2A= ρ(E) K X i=1 a(u, Piu) ≤ ρ(E)kukAk(P − P0)ukA.

Thus, we finally arrive at

a(P u, u) = a(P0u, u) + a ((P − P0)u, u) ≤ a(P0u, P0u) + k(P − P0)ukAkukA ≤ (1 + ρ(E))kuk2

A.

Lemma 10.1 and Lemma 10.2 together with (9) give he desired bound for the condition number of P

Theorem 10.3.

κ(P ) = κ(C−1A) ≤ C2

0(1 + ρ(E)).

References

[AMP00] Douglas N. Arnold, Arup Mukherjee, and Luc Pouly. Locally adapted tetrahedral meshes using bisection. SIAM J. Sci. Comput., 22(2):431– 448 (electronic), 2000.

(24)

[AO00] M. Ainsworth and T.J. Oden. A posteriori error estimation in finite element analysis. Wiley, 2000.

[AS97] M. Ainsworth and B. Senior. Aspects of an adaptive hp-finite element method: adaptive strategy, conforming approximation and efficient solvers. Comput. Meth. Appl. Mech. Engrg., 150:65–87, 1997.

[AS98] M. Ainsworth and B. Senior. An adaptive refinement strategy for hp-finite-element computations. Appl. Numer. Math., 26:165—178, 1998.

[AS99] M. Ainsworth and B. Senior. hp-finite element procedures on non-uniform geometric meshes. In M. Bern, J. Flaherty, and M. Luskin, editors, Grid generation and adaptive algorithms, pages 1–27. Springer Verlag, 1999. IMA Vol. Math. Appl. 113.

[BPX90] J. H. Bramble, J. E. Pasciak, and J. Xu. Parallel multilevel precondi-tioners. Mathematics of Computation, 55(191):1–22, 1990.

[BPX91] J. Bramble, J. Pasciak, and J. Xu. Parallel multilevel preconditioners. Math. Comput., 55:1–22, 1991.

[BR03] Wolfgang Bangerth and Rolf Rannacher. Adaptive finite element meth-ods for differential equations. Lectures in Mathematics ETH Z¨urich. Birkh¨auser Verlag, Basel, 2003.

[Bra97] D. Braess. Finite Elements. Cambridge University Press, 1997. [BS01] I. Babuˇska and T. Strouboulis. The finite element method and its

reliability. Oxford University Press, 2001.

[Cia76] P. G. Ciarlet. The Finite Element Method for Elliptic Problems. North-Holland Publishing Company, 1976.

[Eib06] T. Eibner. Randkonzentrierte und adaptive hp-FEM. PhD thesis, TU Chemnitz, 2006.

[EM07] T. Eibner and J. M. Melenk. Multilevel preconditioning for the bound-ary concentrated hp-fem. Comp. Meth. Mech. Eng. 196 (2007), pp. 3713-3725., 2007.

[HKK05] W. Hackbusch, B.N. Khoromskij, and R. Kriemann. Direct Schur complement method by domain decomposition based on H-matrix ap-proximation. Comput. Vis. Sci., 8(3-4):179–188, 2005.

[HS05] P. Houston and E. S¨uli. A note on the design of hp-adaptive finite element methods for elliptic partial differential equations. Comput. Meth. Appl. Mech. Engrg., 194:229–243, 2005.

(25)

[KL04] V.G. Korneev and U. Langer. Domain decomposition and precon-ditioning. In E. Stein, R. de Borst, and Th.J.R. Hughes, editors, Encyclopedia of Computational Mechanics, volume 1 of Chapter 22. John Wiley & Sons, 2004.

[MW01] J.M. Melenk and B. Wohlmuth. On residual-based a posteriori error estimation in hp-FEM. Advances in Comp. Math., 15:311–331, 2001. [Nep86] S.V. Nepomnyaschikh. Domain Decomposition and Schwarz

Meth-ods in a Subspace for the Approximate Solution of Elliptic Boundary Value Problems. PhD thesis, Computing Center of Siberian Division of Russian Academy of Sciences, 1986.

[Osw94] P. Oswald. Multilevel Finite Element Approximation. Teubner Skripten zur Numerik. Teubner, 1994.

[Sch98] C. Schwab. p- and hp-Finite Element Methods. Oxford University Press, 1998.

[SMPZ08] J. Sch¨oberl, J.M. Melenk, C. Pechstein, and S. Zaglmayr. Additive schwarz preconditioning for p-version triangular and tetrahedral finite elements. IMA J. Numer. Anal. 28 (2008), pp. 1-24, 2008.

[TW05] A. Toselli and O. Widlund. Domain Decomposition Methods — Algo-rithms and Theory. Springer Verlag, 2005.

[Ver96] R. Verf¨urth. A review of a posteriori error estimation and adaptive mesh refinement techniques. Teubner-Wiley, 1996.

[Xu90] J. Xu. Iterative methods by space decomposition and subspace correc-tion: A unifying approach. Report No. AM 67, Dep. of Mathematics, Penn. State University, 1990.

[Xu92] Jinchao Xu. Iterative methods by space decomposition and subspace correction. SIAM Review, 34:581–613, 1992.

[Zha92] S. Zhang. Multilevel schwarz methods. Numer. Math., 63:512–539, 1992.

(26)

(27)

(28)