2D Continuous Chebyshev-Galerkin Time-Spectral Method

(1)

2D Continuous Chebyshev-Galerkin Time-Spectral

Method

Kristoffer Lindvall∗, Jan Scheffel

Division of Fusion Plasma Physics

School of Electrical Engineering and Computer Science KTH Royal Institute of Technology, Stockholm, Sweden

Abstract

A fully spectral multi-domain method has been developed and applied to three applications within ideal MHD, compressible Navier-Stokes, and a two-fluid plasma turbulence model named the Weiland model. The time-spectral method employed is the Generalized Weighted Residual Method (GWRM), where all domains such as space, time, and parameter space are spectrally decomposed with Chebyshev polynomials. The spectral decomposition of the temporal domain allows the GWRM to reach spectral accuracy in all dimensions. The GWRM linear/nonlinear algebraic equations are solved us-ing an Anderson Acceleration (AA) method and a newly developed Quasi Semi-Implicit root solver (Q-SIR). Up to 85% improved convergence rate was obtained for SIR as compared to AA and in certain cases only Q-SIR converged. In the most challenging simulations, featuring steep gradi-ents, the GWRM converged for time intervals roughly two times larger than typical time steps for explicit time-marching schemes, being limited by the CFL condition. Time intervals up to 70 times larger than those of explicit time-marching schemes were used in smooth regions. Furthermore, the most computationally expensive algorithm, namely the product of two Chebyshev series, has been GPU accelerated with speedup gains of several thousands compared to a CPU.

Keywords: Chebyshev; time-spectral; GWRM; ODE; PDE; GPU.

∗_{Kristoffer Lindvall.}

(2)

1. Introduction

Numerical methods for differential equations are routinely used for en-gineering applications. They allow us to harness the vast amounts of com-putational power to solve highly complex problems. The benefits of using computers to solve complex problems are many; predicting the future state of a system (e.g. weather prediction), theoretical insight where in-situ exper-iments are at present not possible (e.g. black holes), prototype development (e.g. airplane design) and more.

A task can be accomplished with varying degrees of efficiency and nu-merical methods handle certain tasks differently. It is important then to develop numerical methods that are general, so it can be used in as many research fields as possible, whilst also being computationally efficient, stable, and accurate.

Numerical methods come in essentially two groups; local and global. An example of a local method is the finite-difference method [1]. As the name suggests it approximates the derivative by introducing local grid points so that the derivative can be approximated with a finite-difference scheme. An example of a global method is the spectral method [2, 3, 4]. The spectral method introduces an ansatz that is valid globally, typically a series of poly-nomials, and substitutes that into the equations of interest. Each method has its advantages and disadvantages and it is not uncommon to see hybrid methods that seek to include the advantage of both groups [5, 6].

Spectral methods have been a popular choice in fluid dynamics, however the focus of most research on spectral methods has been on treating the spatial dimensions. In 1989 Cockburn and Shu [7] introduced a Discontinuous Galerkin Spectral method that discretized the spatial domain and an explicit Runge-Kutta method for time-stepping. In the same year Cai et al. [8] published a non-oscillatory Fourier Spectral method which also used Runge-Kutta for the time domain. The advantage of an explicit time-integration scheme over implicit or spectral is that it does not require an iterative solver to calculate a time step. This approach of mixing spectral accuracy in space and explicit time-stepping is then simple to apply and it lends itself to parallel computations.

However, explicit time-stepping schemes typically require very small steps which can become computationally inefficient for large scale problems, specif-ically when higher orders of accuracy are required [9, 10]. Certain mathemat-ical models that feature a large separation of scales can also be an issue for

(3)

explicit time-integration schemes since the fast dynamics needs to be strictly followed due to the CFL condition.

To increase the stability of the time-integration implicit methods seek to solve a system of equations in each time step. Implicit time-integration has a long history in numerical studies going back to the 1960s and 1970s [11, 12]. A class of implicit methods were later published by L. Fezoui in 1989 [13] where Euler equations were solved. Shown to be more efficient than the explicit methods [13], however the implicit methods required approximate nonlinear terms and matrix operations. A more efficient implicit scheme was developed by Rasetarinera et al. (2001) [10] where a NewtonKrylovSchwarz algorithm with Schwarz preconditioning was used. The method relies on efficient matrix operations and it is highly parallellizable, but can feature problems with preconditioning when the number of elements are increased [10]. The implicit schemes can however suffer from low temporal accuracy, and it does not scale as well as the explicit schemes on high performance clusters.

Another method of solving the temporal domain is to use a spectral ap-proach. In 1986 and 1989 Tal-Ezer [14, 15] introduced a spectral method for linear hyperbolic and parabolic equations that included the temporal domain in the spectral decomposition. The spectral method by Tal-Ezer used Cheby-shev polynomials for the time domain and Fourier polynomials for the spatial domain. The method presented spectral accuracy in all domains, however it suffered from dense matrix inversions and was only applicable to a small set of problems. A more general three-dimensional multidomain collocation method was later developed by Luo (1997) [16]. During this time many meth-ods have been developed [17, 18, 19, 20, 21] which show the applicability of spectral methods for time-marching.

Recent studies of time-spectral methods include the space-time discon-tionuous galerkin (DG) methods [22, 23, 24, 25]. These are pseudo-spectral methods that are advantageous when using complex geometries such as quadri-laterial, triangular, and prismatic meshes. The difference between the space-time DG method and the GWRM developed here is that the GWRM is a fully spectral method, whereas the space-time DG method is pseudo-spectral. All the computations of the GWRM are performed in spectral space so the GWRM does not need to use Gauss-Quadrature and tensors. Also, continu-ous boundary conditions are used with the GWRM instead of matching the solution fluxes between interfaces.

(4)

Generalized Weighted Residual Method (GWRM) was developed that used Chebyshev polynomials consistently in all domains [26, 27]. It was found for some cases to be more efficient than the finite difference methods, namely the explicit Lax-Wendroff and implicit Crank-Nicolson method. In Scheffel (2011) [26] the parameter domain was included in the solution ansatz when applying the GWRM. Later the GWRM was applied in Riva et al. (2017) [28] where it was used to solve the 2D Braginskii equations describing fu-sion plasma turbulence. However, in [28] only one spatial subdomain was implemented.

The numerical method developed here is specifically a 2D Continuous Chebyshev-Galerkin method with a spatio-temporal solution ansatz. This is the equivalent of a 2D multi-domain and time-adaptive version of the GWRM introduced in [26, 27]. The computational domain is discretized into multiple subdomains that feature overlapping regions to ensure pointwise and gradient continuity. The temporal domain is solved in time intervals that adapt to match the preset accuracy of the simulation.

The result of the discretization of the weak form partial differential equa-tions is a system of algebraic equaequa-tions. Since the method is unconditionally stable, the maximum time-step that the method can take is solely dependent on the accuracy that the solution requires and the convergence properties of the root solver. Thus, it becomes a necessary task to use an efficient root solver in terms of memory, convergence rate, and global stability. In this regards a new Quasi Semi-Implicit root solver has been developed here and subsequently compared with the Anderson Acceleration method.

The GWRM, like all spectral methods, features spectral accuracy in smooth regions. However, in regions with steep gradients and shocks, the spectral method experiences Gibbs oscillations. This can be mitigated with appropriate artificial viscosity terms in the conservation laws. Although, in order to keep spectral accuracy near steep gradients, other shock capturing techniques or post-processing filters need to be applied.

The article will begin in sections 2-4 with a description of the three sets of conservation laws to be solved with the GWRM, namely the compress-ible Navier-Stokes equations, ideal Magnetohydrodynamics, and an advanced fluid turbulence model called the Weiland model. In Section 5 the GWRM is presented followed by subsections 5.1 on parallelizing the GWRM with GPUs and subsection 5.2 on the root solvers Quasi Semi-implicit root solver and the Anderson Acceleration method. Section 6 includes a brief description of the digital total variation filter used for post-processing. In Section 7 the

(5)

results will be presented followed by a discussion and conclusion. 2. Compressible Navier-Stokes

The compressible Navier-Stokes (CNS) model is a theoretical hydrody-namic model that describes the macroscopic flow of a compressible fluid. The compressible Navier-Stokes equations are stated as follows,

∂ρ ∂t + ∇ · (ρv) = 0, [Continuity equation] (1) ∂v ∂t + v · ∇v + 1 ρ∇p − κ∇ 2_{v = 0,} _{[Momentum equation]} ₍₂₎ d dt(ρp −γ ) = 0. [Energy equation] (3)

where ρ is the mass density, u is the velocity vector, p is the pressure, κ is the kinematic viscosity, and γ is the adiabatic index. Here the stress tensor has been approximated by neglecting heat conduction.

The compressible Navier-Stokes equations above model a compressible fluid that conserves mass, momentum, and energy. Here the force acting on a collection of particles, also called fluid elements, comes by way of pres-sure gradients. Also, fluid elements exchange momentum through the force supplied by the viscosity.

The Reynolds number is the ratio between inertial forces and viscuous forces. For large Reynolds numbers R >> 1 the flow becomes inviscid and the free energy in the system will quickly seed turbulence. In an inviscid fluid there exists a large range of scales, and a typical quality of 2D turbulence is the cascade of energy from smaller to larger scales.

A common benchmark case for the Navier-Stokes model, compressible and non-compressible, is the Kelvin-Helmholtz instability (KHI). This test is used to simulate the transition of a laminar to turbulent fluid where inverse energy cascades create large scale vortexes. The KHI is also used to study the properties of a numerical method in terms of spatial and temporal accuracy, specifically involving the separation of scales in the turbulence.

3. Ideal Magnetohydrodynamics

The ideal Magnetohydrodynamic (MHD) model is a theoretical model that, like Navier-Stokes, treats the collection of particle movements as a

(6)

fluid. The ideal MHD model differs from Navier-Stokes in the type of fluid it seeks to model. The ideal MHD fluid, called a plasma, is electrically conducting and thus reacts on electromagnetic fields. By treating a collection of particles as a fluid element we can effectively eliminate the individual particle’s microscopic behaviour.

The ideal MHD equations are obtained by taking the mass, momentum, and energy moments of the Boltzmann equation and integrating over the particle velocities. To evaluate the integrals the plasma is considered collision dominated and thus a Maxwellian distribution function can be applied. The mass, momentum, and energy equations are then coupled to the Lorentz invariant Maxwell equations that describe the interplay and propagation of electromagnetic waves.

The single fluid ideal MHD equations are stated as ∂ρ ∂t + ∇ · (ρv) = 0, [Continuity equation] (4) ρ∂v ∂t + v · ∇v + ∇p − 1 µ0 (∇ × B) × B = 0, [Momentum equation] (5) ∂B ∂t − ∇ × (v × B) = 0, [Magnetic induction] (6) d dt(ρp −γ ) = 0. [Energy equation] (7) Included are the density ρ, velocity v, pressure p, and the magnetic field B. The most important assumptions of the model are; neglect of displacement current, neglect of electron mass, non-relativistic velocities, quasi-neutrality ne= ni, and collision domination (characteristic length is much smaller than

the Debye length, Lc<< λD).

The test problem chosen here is the Orszag-Tang vortex (OTV). The OTV featured many difficulties for numerical methods. These include; a large separation of scales, shock waves and shock-shock interaction, divergence cleaning, and the issue of small time-steps due to the increase in spatial accuracy.

3.1. Divergence cleaning

The initial magnetic field satisfies the divergence constraint ∇·B = 0 and for a perfect solution the divergence of the magnetic field will remain zero for all time. However, the numerical errors can introduce nonzero divergence of

(7)

the magnetic field, leading to nonphysical solutions. Additional information needs to be added to the system in order to ensure that the above constraint is actually satisfied. The technique used here for divergence cleaning is that developed by Dedner et al. (2001) [29] where the magnetic induction equation is coupled to the divergence constraint by introducing a scalar field ψ,

∂B ∂t − ∇ × (v × B) + ∇ψ = 0, (8) ∂ψ ∂t + c 2 h∇ · B = − c2_h c2 P ψ. (9)

Here a mixed approach of propagation (first term) and dissipation (second term) is added to the evolution of the scalar function. This will allow the error to propagate and dissipate, in effect diffusing the error in the divergence constraint out of the domain. The coefficients chand cP are determined from

the fastest wave speed of the ideal MHD equations. Typically zero-valued Dirichlet boundary conditions are applied to Eq. (9) in order to force the error to zero at the edges. However, for the ease of implementation periodic boundary conditions are applied here, noting that similar results can be achieved with both periodic and Dirichlet boundary conditions.

4. The Weiland model

The third application for the GWRM is a higher order partial differen-tial equation that describes the turbulence in a fusion plasma, specifically the Ion-Temperature Gradient (ITG) drift-wave instability. The free energy provided by the gradients of the plasma properties such as pressure, density, temperature, and the magnetic field result in ubiquitous drift-zonal waves in the plasma.

The ITG mode becomes unstable when a critical temperature gradient is surpassed, and the evolution of the ITG can be described by a two-fluid turbulence model named the Weiland model [49, 50, 47]. This model is an advanced fluid model that is derived from first principles. First, the momentum equation derived from Braginskii’s equations is used to obtain

(8)

the gyrocenter drifts ˜ vE = 1 B(ˆek× ∇φ), [E × B drift] (10) v∗j = 1 qjnjB (ˆek× ∇pj), [Diamagnetic drift] (11) ˜ v∗j = 1 qjnjB

(ˆek× ∇δpj), [Perturbed diamagnetic drift] (12)

vπj =

1 qjnjB

(ˆek× ∇ · ↔

πj), [Stress tensor drift] (13)

vpj = − 1 Bωcj ∂ ∂t+ vj · ∇ ∇φ, [Polarization drift] (14) vDi = 2pj qnjB2 (ek× ∇B), [Magnetic drift] (15)

where we have the electrostatic potential φ, pressure p, stress tensor ↔π , elec-tric charge q, density n, magnetic field field B, and gyrofrequency ωcj =

qjB/mj. The subscript j denotes either ion or electron species and the

sym-bols “δ” and “∼” denotes a perturbed variable and velocity vector, respec-tively. The system of equations that are to be coupled are then the continuity equation with all of the drift velocities included and the energy equation;

∂ni ∂t + ˜vE · ∇ni+ ni∇ · ˜vE+ ∇ · (niv˜∗i) + ∇ · (nivπi) + ∇ · (nivpi) = 0 (16) 3 2ni ∂ ∂t + vc· ∇ Ti− Ti ∂ ∂t+ vc· ∇ ni = − 5 2nivDi· ∇Ti (17) where vcis the convective terms of the guiding center drift velocity (excluding

the diamagnetic drift since it cancels exactly). The vc drift can be

approx-imated to lowest order E × B drift. The continuity and energy equation above is closed by assuming the heat flux is equal to the diamagnetic heat flux obtained from the stress tensor,

∇ · q_i = ∇ · q∗i= −

5

2niv∗i· ∇Ti+ 5

(9)

and the FLR terms are included from the polarization and stress tensor drifts, ∇ ·hni(vp+ vπ) i = − ni Te ρ2_s ∂ ∂t∆(eφ) − ρ2_s Te ∆∂δpi ∂t − ni Te ρ2_s(˜v∗i· ∇)∆(eφ) − ni Te ρ2_s(˜vE · ∇)∆(eφ). (19)

From the above equations the Weiland model for a two-fluid toroidal ion-temperature gradient model can be derived. The Weiland model with adia-batic electrons is stated as

∂ ∂t + ˜vE · ∇ ! δTi+ 7 3 + 2 3ρ 2 s∆ ! vDi· ∇δTi + ( 1 τ " ηi+ 2 3 1 + 1 + ηi τ ! ρ2_s∆ # v∗e +2 3 1 + 1 τ ! (1 + ρ2_s∆)vDi ) · ∇eφ = 2 3τρ 2 s(˜v∗i+ ˜vE) · ∇∆eφ, (20) ∂ ∂t + ˜vE · ∇ ! (1 − ρ2_s∆)eφ + " 1 + 1 + ηi τ ρ 2 s∆ ! v∗e+ (1 + τ )vDi # · ∇eφ + τ vDi· ∇δTi = ρ2s(˜v∗i· ∇)∆eφ, (21)

where φ and δTi are the perturbed electrostatic potential and ion

temper-ature, respectively. The free parameters in the model are ηi = Ln/LT (Ln

and LT are the density and temperature length scales), τ = Te/Ti, and

ρs =pτ /2ρi (ρi is the ion Larmor radius).

5. A time-spectral method

The time-spectral method presented here is based on the Generalized Weighted Residual Method (GWRM) (Scheffel [26, 27]) of the Galerkin kind. To see how this works we start with an arbitrary partial differential equation,

∂u

(10)

where u(t, x; p) is the solution vector, D is a linear or nonlinear opera-tor, f (t, x; p) is a source term, defined on the domain (t, x; p) ∈ Rn. Note here that the solution vector u and the source term f can depend on time, space, and parameter space. A solution ansatz is introduced that seeks to approximate u with a series of Chebyshev polynomials of the first kind Tk(x) = cos(karccos(x)). For simplicity we employ one spatial and parameter

domain, giving us the finite series approximation u(t, x; p) ≈ uN(τ, ξ; P ) = K X k=0 0 L X l=0 0 M X m=0 0 aklmTk(τ )Tl(ξ)Tm(P ), (23)

where aklm are the Chebyshev coefficients that become the new unknowns

of the system and τ , ξ, and P are the spectral-space variables defined in unit cube [−1, 1]. Here the prime symbol denotes the zeroth terms being divided by two. Since the Chebyshev polynomials are orthogonal in the interval [−1, 1] the linear combination of Chebyshev polynomials will have an increasing accuracy the higher mode numbers K, L, and M that are chosen.

In order to solve Eq. (22) it first needs to be discretized into a finite problem set. This is what the GWRM does, by multiplying the residual of Eq. (22) by so called test functions and their respective weights and subsequently integrating over the entire computational domain Ω = {[t0, t1] × [x0, x1] ×

[p0, p1]}. This effectively transforms the strong form of the Eq. (22) into the

so-called weak formulation (25). RN = ∂u N ∂t − D[u N_{] − f 6= 0} ₍₂₄₎ Z Ω RNTq(τ )Tr(ξ)Ts(P )wtwxwpdΩ = 0 (25)

By introducing the approximate solution uN _{into the residual it will not}

strictly be equal to zero as can be seen in Eq. (24). However, by forcing the inner product of the residual and weighted Chebyshev polynomials to zero the GWRM will seek to minimize the residual globally.

The resultant set of linear or nonlinear algebraic Eqs. (25) are then sup-plemented with initial and boundary conditions. In the formulation above the temporal and spatial domain has been treated identically, thus the tem-poral domain can be seen as a having a boundary at t0 = 0, akin to the

(11)

spatial boundary conditions. However, a more intuitive way of introducing the initial condition can be obtained by first integrating the original partial differential equation, here giving us another form of the residual

R∗N = uN −huN₀ + Z t

0

(D[uN] − f )dt0i (26)

where now the initial condition uN₀ = uN(0, x; p) is explicitly introduced. Substituting Eq. (26) into Eq. (25) and introducing the linear transformation Bz = (z1−z0)/2 (z = t, x, p) the integrals can be solved analytically, resulting

in the form

aqrs = 2δq0brs+ Aqrs+ Fqrs. (27)

where brs is the decomposed initial condition, Aqrs is the Chebyshev

co-efficients of the expansion of the second term in Eq. (25), and Fqrs is the

Chebyshev coefficients of the expansion of the third term. Eq. (27) is defined for the modes 0 ≤ k ≤ K, 0 ≤ l ≤ L, and 0 ≤ m ≤ M .

The integral terms are defined for K + 1 modes, however due to the min-imax properties of Chebyshev polynomials the K + 1 mode can be discarded so that the system is well-posed. The differential operator will reduce the expansion coefficients to L − r, where r is the order of the differential op-erator. The highest L modes are recovered by adding boundary conditions for each r, i.e. with r = 2 the equations for the L − 1 and L modes become boundary conditions.

5.1. GPU-acceleration

The Chebyshev modules (differentiation, integral, and product) in the GWRM are highly efficient ways of computing derivatives and integrals, with the exception of the Chebyshev product. Performing the product of two se-ries of polynomials is widely seen to be an inefficient way of computing the product, hence the use of pseudo-spectral methods that perform products in real space. However, Chebyshev polynomials with the minimax property allows the truncation of the resultant product series. Thus, higher order mode multiplications can be neglected making the Chebyshev product mod-ule relatively efficient compared to other polynomials. The product of two

(12)

dimensional Chebyshev series 1 _is ckl= 1 4 k X i=0 " _l X j=0 ai,jbk−i,l−j+ L−l X j=1 ai,jbk−i,l+j+ L−l X j=1 ai,l+jbk−i,j # + K−k X i=1 " _l X j=0 ai,jbk+i,l−j+ L−l X j=1 ai,jbk+i,l+j+ L−l X j=1 ai,l+jbk+i,j # + K−k X i=1 " _l X j=0 ak+i,jbi,l−j+ L−l X j=1 ak+i,jbi,l+j+ L−l X j=1 ak+i,l+jbi,j # . (28)

The way to accelerate this formula is by realizing how much of the com-putations are actually performed privately, i.e. some comcom-putations can be run in parallel. There are several ways to parallelize existing codes, e.g. us-ing multiple cores and threads with CPUs (central processus-ing unit), and/or the massive amounts of CUDA (Compute Unified Device Architecture) cores in GPUs (graphics processing unit). The GPU consists of Streaming Multi-proccesors (SMs) that contain CUDA cores with shared memory that can be used to perform a massive amount of floating point operations in parallel.

Existing codes written in C/C++ or Fortran can be GPU accelerated with the Cuda toolkit developed by Nvidia. The toolkit allows you to send and receive data structures to the GPU and specify how many threads are to be allocated to the task. How the threads are to perform the task and how they communicate are called thread strategies.

In order to GPU accelerate Chebyshev product modules two thread strate-gies will be employed; monolithic kernel and the stride kernel. The mono-lithic kernel applies one thread per loop index, e.g. a loop i = 1..5 will have five threads performing the contents of the loop in parallel. This strategy is simple and performs well compared to a parallel CPU. The disadvantage of the monolithic kernel is that not enough threads are being deployed and thus not using the GPU to its full potential. Below is the Cuda-C code for the monolithic kernel.

1 i n t i = b l o c k I d x . x ∗ blockDim . x + t h r e a d I d x . x ; 2 d o u b l e sum ;

3 i f ( i < n ) { 4 sum = 0 . f ;

(13)

5 f o r (i n t j = 0 ; j <=i ; j ++){ 6 sum += a [ j ] ∗ b [ i −j ] ;

7 }

8 f o r (i n t j = 1 ; j < n−i ; j ++){ 9 sum += a [ j ] ∗ b [ j+i ] + a [ j+i ] ∗ b [ j ] ;

10 }

11 c [ i ] = 0 . 5 f ∗sum ; 12 }

To saturate the GPU threads the stride kernel deploys blocks of threads per loop index. The idea behind the stride kernel is to use more threads, and also to perform the contents of the loops in parallel. This strategy is suitable to the Chebyshev product module due to the fact that the blocks of threads can perform the double-summations in parallel with a reduction sum. A simplified Cuda-C code is presented below of the stride kernel 2.

1 c o n s t u n s i g n e d row mask = ˜ ( ( 0xFFFFFFFFU>>5)<<5) ; 2 i n t k = b l o c k I d x . x ;

3 i n t r o w w i d t h = ( ( ( k ) >(n−k ) ) ? ( k ) : ( n−k ) ) +1;

4 i n t s t r i d e s = ( row width >>5) + ( ( r o w w i d t h & row mask ) ? 1 : 0 ) ; 5 i n t j = t h r e a d I d x . x ; 6 d o u b l e tmp a ; 7 d o u b l e sum = 0 . 0 f ; 8 f o r (i n t s =0; s<s t r i d e s ; s++) { 9 i f ( j < n ) tmp a = a [ j ] ; 10 i f ( j <= k ) sum += tmp a ∗b [ k−j ] ; 11 i f ( ( j > 0 ) && ( j < ( n−k ) ) ) sum += tmp a ∗b [ j+k ] + a [ j+k ] ∗ b [ j ] ; 12 j += 3 2 ; 13 } 14 f o r (i n t o f f s e t = w a r p S i z e >>1; o f f s e t >0; o f f s e t >>= 1 ) { 15 sum += s h f l d o w n s y n c ( 0xFFFFFFFFU, sum , o f f s e t ) ; } 16 i f ( ! t h r e a d I d x . x ) c [ k ] = 0 . 5 f ∗sum ;

5.2. Root solver

The algebraic Eqs. (27) can be cast in the form x = φ(x) : RN _{→ R}N_.

This can be solved with a root solver of choice that iterates from an initial guess x0 to an approximate solution x∗ = φ(x∗) with a chosen tolerance.

Here the semi-implicit root solver (SIR), with its quasi-implementation, is presented along with the Anderson Acceleration (AA) method.

(14)

First, the algebraic equations can be solved via the fixed-point iteration,

xi+1 = φ(xi) (29)

This is the starting point, or basis for every root solver. Since the fixed point iteration converges excessively slow, if at all, it becomes attractive to improve the convergence rate and the global stability.

Semi-implicit root solver

The Semi-implicit root solver (SIR) developed in [30] is a generalization of the Newton method. SIR casts the fixed-point iteration scheme in a semi-implicit form that has the same root as the original equation,

xi+1+ βxi+1= βxi+ φ(xi) (30)

and can then be reformulated so that the advanced iterate is isolated, xi+1= Φ(xi, α) = α(xi − φ(xi_{)) + φ(x}i_). ₍₃₁₎

The result being an iterative scheme with a free parameter α = β/(1 + β). As stated in [30], it is sufficient for the solver with one dimension to converge near the root if |Φ0(xi, α)| < 1 (prime symbol denotes derivative wrt. x). Here we can note that the Newton method sets |Φ0(xi_{, α)| = 0, which allows}

the method to achieve second order convergence close to the root if the initial iterate is sufficiently close to the root. If the problem is multivariate then necessary convergence criterion becomes,

max 1≤m≤N N X n=1 |Φmn| < 1 (32)

By enforcing monotonic convergence SIR can achieve global stability. This is done by setting Φ0(xi, αi) = Ri, where Ri is a diagonal matrix (R)mn= δmnRm with values 0 < Rm < 1. In order to achieve similar second

order convergence rates near the root Ri _{is subsequently decreased in each}

iterate towards zero.

Thus the resulting generalized form of SIR is,

(15)

where Ai _{= I + (R}i _{− I)J}−1 _{and J is the system Jacobian.} _{This can}

however, be reformulated into a compact version resembling the Newton method, namely

xi+1= xi+ (Ri− I)J−1f (xi). (34) Here we see again that the Newton method is recovered when Ri = 0.

As it has been pointed out by a myriad of numerical practitioners, the system Jacobian is often too expensive and time-consuming to calculate. To remedy this quasi methods have been developed that approximate the Jacobian (evaluated in the initial step) and then subsequently updated in each successive iteration [31, 32, 33, 34].

The Quasi Semi-implicit root solver (Q-SIR) employed here introduces an approximate matrix for the inverse of the Jacobian H = J−1, and up-dates the inverse in each iteration. In Algorithm 1 three different upup-dates are presented, namely the ”Good” Broyden, ”Bad” Broyden, and Broyden-FletcherGoldfarbShanno (BFGS) method. The first two being a rank one update of the matrix H, and the third being a rank two update. The rank of a matrix is calculated as the number of linearly independent rows and columns. The rank one and rank two updates require matrix-vector

(16)

multi-plications and matrix-matrix multimulti-plications, respectively.

Input: A vector function f (x) : RN

→ RN _{and initial estimate x}0

∈ RN_.

Output: A vector x∗, being a root of the matrix equation f (x∗) = 0.

Data: N - number of equations to be solved, tol – solution accuracy, Imax max

number of iterations, Rf ac - factor controlling R at each iteration; 1 R := δmnRm, I := Identity matrix; 2 H = J−1; 3 for i = 1 to Imax do 4 x1= x0+ (R − I)Hf (x0); 5 ε = ||f (x1)||2; // 2-norm 6 if ε < tol then 7 break; 8 end 9 s = x1− x0; 10 y = f (x1) − f (x0);

11 Case ”Good” Broyden: H = H + (s − Hy)(sTH)/(sTHy); 12 Case ”Bad” Broyden: H = H + (s − Hy)yT/(yTy);

13 Case BFGS: H = (I − (syT)/(yTs))H(I − (ysT)/(yTs)) + (ssT)/(yTs); 14 R = Rf acR;

15 x0= x1; 16 end

17 x∗= x1;

Algorithm 1: SIR: Semi-implicit root solver Anderson Acceleration

The Anderson Acceleration (AA) method is a nonlinear root solver de-veloped by D. G. Anderson [35]. The AA method is a method of accelerating the fixed-point iteration and it has the advantage over other root solvers in the Newton class in that it does not need to calculate the system Jacobian. Instead, it seeks to solve a minimization problem. The basic AA algorithm

(17)

presented by Walker et al. [36] can be seen in Algorithm 2.

Input: A vector function g(x) : RN

→ RN _{and initial estimate x}0

∈ RN_.

Output: A vector x∗, being a root of the matrix equation g(x∗) = x∗.

1 Given x₀ and m ≤ 1; 2 Set x₁= g(x₀); 3 for k = 1, 2, .. do 4 Set mk = min {m, k}; 5 Set Fk= (fk−mk, ..., fk) where fk= g(xi) − xi; 6 Determine α(k)= (α(k)₀ , ..., α(k)mk) T _{that solves;} 7 minα=(α0,...,αmk)T||Fkα||2s.t. Pmk i=0αi= 1; 8 Set xk+1=P mk i=0α (k) i g(xk−mk+i); 9 end

Algorithm 2: Anderson Acceleration of fixed-point iteration

Algorithm 2 of the AA method seeks to find a set of coefficients a(k)

that minimizes the linear combination Pmk

j=1fk−mk+jα

k

j. Intuitively, the AA

method attempts to update the fixed-point iteration in the direction that minimizes the stored residuals the most. The original algorithm by [35] can be reformulated to an unconstrained version as in [36, 45]

xk+1 = g(xk) − mk X j=1 [g(xk−mk+j) − g(xk−mk+j−1)]γ (k) j (35)

where the coefficients γ(k) _{is determined to solve the least-squared}

prob-lem min_γ=(γ₀_,...,γ_mk−1₎T||f_k− F_kγ||₂, where F_k is the matrix [(f (x_k−m k+j) − f (xk−mk+j−1)]j. The damping/relaxation version of the unconstrained AA method is then, xk+1 =g(xk) − mk X j=1 [g(xk−mk+j) − g(xk−mk+j−1)]γ (k) j − (1 − β)(f (xk) − mk X j=1 [f (xk−mk+j) − f (xk−mk+j−1)]γ (k) j ). (36)

The main benefit of the unconstrained version of the AA method is that the matrix Fk can be decomposed with QR decomposition. Furthermore, for

each successive iteration k only the QR matrices need to be updated. The method used in solving the GWRM algebraic equations is the unconstrained

(18)

AA method with relaxation and QR decomposition, see Algorithm 3. Fur-thermore, the AA method is a ‘matrix-free’ root solver, i.e. no Jacobian or inverse needs to be stored in memory.

Input: A vector function g(x) : RN → RN _{and initial estimate x}0

∈ RN_.

Output: A vector x∗, being a root of the matrix equation g(x∗) = x∗. Data: N - number of equations to be solved, tol – solution accuracy, Imax

-max number of iterations, mmax- max number of stored residuals; 1 mAA= 0; F = []; G = []; 2 for k = 0 to Imax do 3 g1= g(x0); 4 f1= g(x0) − x0; 5 ε = ||f1||₂; // 2-norm 6 if ε < tol then 7 break; 8 end 9 if mAA= 0 then 10 x0= g1; 11 else 12 if mAA< mmax then 13 F = concat(F , f1− f0); 14 G = concat(G, g1− g0); 15 else 16 F = concat(F [:, 2 : mAA], f1− f0); 17 G = concat(G[:, 2 : mAA], g1− g0); 18 mAA= mAA− 1; 19 end 20 Q, R = QRdecomp(F ); 21 γk = LeastSquares([Q, R], f1); 22 x0= x0− Gγk ; 23 end 24 f0= f1; g0= g1; 25 mAA= mAA+ 1; 26 end 27 x∗= x0;

Algorithm 3: AA - Anderson Acceleration w/ QR-decomp.

6. Filtering

In order to relax the steep gradients present in both the Navier-Stokes and ideal MHD simulations artificial viscosity in the form ν∇2 _{(ν artificial}

(19)

scalar ψ in the ideal MHD case). These are set low enough to allow substan-tial gradients and Gibbs oscillations to occur. Then in order to recover the underlying resolution of the spectral method the Gibbs oscillations need to be removed. A multitude of techniques exist that do this, e.g. modal filter, Gegenbauer reconstruction, inverse reconstruction, Digital Total Variation (DTV) filters etc. The post-processing techniques implemented here is the DTV filter.

The DTV method is a post-processing technique that is applied to the physical evaluation of the spectral solution. Thus, the DTV filter is a tech-nique that reduces noise in a signal [37, 38]. It has been used successfully in image denoise applications, and even though the Gibbs oscillations are not random noise, it can be filtered as if it were noise. This allows us to recover the accuracy of the solution.

The application of DTV has been applied to direct numerical simulations of conservation laws and have shown that Gibbs oscillations can be removed and that steep gradients can be adequately resolved [39, 40]. Below follows the formulation of the DTV method presented by Sarra (2006) [39]. The Gibbs oscillations, or noise, can be removed by creating a uniform mesh of unfiltered solution values u0_{, and then solving the unconstrained}

minimiza-tion problem E_λT V(u) = X α∈Ω |∇αu|a+ λ 2||u − u 0_||2 Ω, (37)

where ET V _{is the total variation energy to be minimized, λ is a fitting}

param-eter, || · || is the `2-norm, and a is small parameter that prevents zero-valued terms in the denominator. The result of the minimization problem is a set of nonlinear equations that can be solved with linearized Jacobi iteration

u[n+1]_α =X

β∼α

hαβu [n]

β + hααu0α, (38)

(20)

edge in common. The coefficients hαβ and hαα as defined as, hαβ(u) = wαβ(u) λ +P γ∼αwαγ(u) , (39) hαα(u) = u λ +P γ∼αwαγ(u) , (40) wαβ(u) = 1 |∇αu|a + 1 |∇βu|a (41) and the regularized local variation

|∇αu|a=

s X

β∼α

(uβ − uα)2+ a2. (42)

From the coefficients and weights above it can be seen that the DTV has edge detection built in. When there is a jump in the data, e.g. a shock or steep gradient section, the weight wαβ(u) will be small compared to λ,

which leads to less filtering in that region because hαα(u) is close to unity.

The opposite occurs in smooth regions, where wαβ(u) is large compared to λ,

thus making hαα(u) small, and more of the original noisy data being filtered.

For a more in depth review of the DTV filter see [37, 38, 39, 40]. Here the DTV is applied to the solution in post-processing, and not as a technique to accelerate the GWRM.

7. Results

7.1. Kelvin-Helmholtz instability

Kelvin-Helmholtz instability (KHI) has periodic boundary conditions on all domain boundaries. The initial condition is a fluid with a sharp density gradient between high and low density, which is then perturbed by a shearing motion.

The domain size is [−0.5, 0.5]2 _{and the initial conditions are,}

ρ(x, y) = 1 |y| > 0.25 2 |y| < 0.25 (43) vx(x, y) = −0.5 |y| > 0.25 0.5 |y| < 0.25 (44) vy(x, y) = 0 (45) p(x, y) = 2.5. (46)

(21)

The instability is triggered with an initial perturbation of the velocity vector in both x- and y-directions, ˜u = 0.01sin(4π(0.5 + x)) + 0.001sin(6π(0.5 + y)). The initial condition features a discontinuity at |y| = 0.25. This is relaxed by introducing a finite gradient. The solution of the density at different time intervals can be seen in Figure 1. The simulation used Nx = Ny = 16

subdomains, K = 5 temporal modes, and L = M = 8 spatial modes.

Figure 1: Kelvin-Helmholtz instability with 0.5 time increments (t = 2.0, bottom right) with κ = ν = 1.0e − 4. GWRM parameters; Nx= Ny = 16, K = 5, and L = M = 8.

In Figure 1 the Gibbs oscillations can be seen to develop after t = 1.5, and quickly come to dominate the solution at t = 2.0. In order to remove the oscillations present in the solution a digital filter was applied, see Figure 2. The specific digital filter applied in post-processing was the isotropic nuclear total variation filter. Almost all of the oscillations were removed from the solution while accurately resolving the sharp gradient.

The AA iterations and time-interval lengths in each time step can be seen in Figure 3 for the Navier-Stokes solution (Nx = Ny = 16 subdomains, K = 5

(22)

Figure 2: Total nuclear variation post-processing (filtered right ) on KHI t = 2.0 (non-filtered left ). TNV parameter: λ = 0.4 and τ = 0.01.

temporal modes, and L = M = 8 spatial modes) between times t ∈ [1.0−1.5]. Since explicit finite difference methods typically require 2 − 5 more degrees of freedom [2] (choosing 3 here), this amounts to roughly 384 × 384 points for the same accuracy. The time-interval length is highly dependent on the root solver convergence, since the time-interval is halved if the root solver does not converge. The AA root solver had a maximum of 200 iterations per time interval. It was found that the GWRM solution had roughly an average time-interval length of ∆t = 0.003 − 0.004.

The CFL criterion for two-dimensional problem is C = ∆t(ax/∆x +

ay/∆y), where ax and ay are the fastest wave speeds of the solution and ∆x

and ∆y are the spatial grid sizes. Since the fastest wave is the sound speed ax = ay = csound =pγp/ρ ≈ 2.0, the GWRM time-interval is roughly a

fac-tor 4.6 − 6.14 times larger compared to an explicit time-marching scheme for the compressible Navier-Stokes. Note the sound speed changes throughout the simulation, however for simplicity the sound speed at t = 1.2, approxi-mately csound≈ 2.0, was chosen.

The Q-SIR (w/ “Bad” Broyden inverse update) and the AA method were compared in terms of convergence rate when solving the KHI with GWRM parameters Nx = Ny = 3 and K = 5, and L = M = 6 (see Figure 4).

The Q-SIR method featured a much faster convergence rate then the AA method for all relaxation parameters, although the AA method used but a fraction of the memory consumption compared to the Q-SIR. Both methods

(23)

Figure 3: The number of Anderson Acceleration (AA) iterations (left ) and time interval lengths (right ) at each time interval. Navier-Stokes solution t ∈ [1.0, 1.5] with GWRM parameters: Nx = Ny = 16, K = 5, and L = M = 8. Here the absolute norm ||f (xk)||2

was used.

Figure 4: Convergence rate of Anderson Acceleration and the Q-SIR method. The Navier-Stokes equations (κ = 1.0e − 4) were solved with dt = 5/100 (left ) and dt = 5/50 (right ) with differing relaxation parameters, β = [1.0, 0.1, 0.01] for AA and R = [0.9, 0.5, 0.1] for Q-SIR. The relative norm ||f (xk)||2/N was used here, where N is the number of algebraic

equations.

managed to converge with a time interval of ∆t = 5/100, however for the time interval of ∆t = 5/50 only the Q-SIR method managed to converge in under between 200-250 iterations. With the sound speed (at t = 0) ax = ay = 1.3,

(24)

∆x = ∆y = 1/54, the GWRM computed a 70.2 times larger time-step than an explicit method.

7.2. Orszag-Tang vortex

The ideal MHD problem presented here is the Orszag-Tang Vortex (OTV) [41]. The OTV starts with smooth non-random initial conditions, whereupon complex nonlinear interactions quickly dominate the MHD plasma. The OTV has several features that make it interesting as a test case for numeri-cal methods, as such turbulence, shock waves, and shock-shock interactions throughout the domain.

The domain size is [0, 2π]2 _{and the initial conditions are,}

[ρ, vx, vy, Bx, By, p] = [γ2, −sin(y), sin(x), −sin(y), sin(2y), γ] (47)

where γ = 5/3. The boundary conditions are periodic in all directions. The solution of the OTV at t = 1.5 with GWRM parameters Nx =

Ny = 30, K = 5, and L = M = 6 can be seen in Figure 5 (left ) and the

TNV-filtered solution (right ) with parameters λ = 0.4 and τ = 0.01. The shocks are seen to be degraded by the Gibbs oscillations. The effects of the oscillations can be seen on the AA iterations and time-interval lengths can be seen in Figure 6.

Figure 5: Isotropic total nuclear variation post-processing on the non-filtered GWRM Orszag-Tang result. Density at t = 1.5. TNV parameters: λ = 0.4 and τ = 0.01.

The convergence parameters for the AA and Q-SIR methods can be seen in Figure 7. Both methods managed to converge with time-interval lengths

(25)

Figure 6: The number of Anderson Acceleration (AA) iterations (left ) and time interval lengths (right ) at each time interval. The GWRM parameters; Nx = Ny = 30, K = 5,

L = M = 6 and κ = ν = 1.0e − 3. The root solver convergence was computed with the absolute norm ||f (xk)||2..

∆t = 3/100 and ∆t = 3/10. The AA method performed similarly to the Q-SIR method for the OTV test case when using GWRM parameters Nx =

Ny = 3 and K = L = M = 5.

Figure 7: Convergence rate of Anderson Acceleration and the Q-SIR method. The ideal MHD equations were solved with dt = 3/100 (left ) and dt = 3/10 (right ) with differing relaxation parameters, β = [1.0, 0.1, 0.01] for AA and R = [0.95, 0.5, 0.1] for Q-SIR. The relative norm ||f (xk)||2/N was used here, where N is the number of algebraic equations.

For the OTV test with lower degrees of spatial accuracy (Nx = Ny = 3,

K = 5, and L = M = 6) and ax = ay ≈ 3.0 the GWRM managed to take

(26)

However this was computed when the solution was smooth and irrespective of temporal accuracy. For higher degrees of spatial accuracy (Nx = Ny = 30,

K = 5, and L = M = 6) the GWRM took roughly 2.57 − 14.39 times the time-interval lengths. Here the temporal accuracy far exceeded the pre-set tolerance due to poor root solver convergence, i.e. the time-interval is halved when the root solver fails to converge thus increasing the temporal accuracy. The time-interval length of the GWRM can then be seen to degrade significantly when strong shocks are formed.

7.3. Weiland model: ion-temperature gradient

The Weiland model was simulated in a slab geometry on the outboard side of a tokamak device with the Cyclone base case parameters τ = 1.0, = 0.090, and ηi = 3.14.The computational domain for the simulation was

D = {(t, x, y) : t > 0, [0, 80]2_{} with double-periodic boundary conditions.}

The subdomains were discretized with four overlapping points for the inter-nal patching conditions between interinter-nal domains and also for the exterinter-nal boundary conditions.

The solution of the Cyclone base case with all linear and FLR terms can be seen in Figure 8b. Large structures in the radial direction appear as so called streamers, also known as convective cells. The maximum growth rate of solution was γITG= 0.45. The growth rates coincide with previous

simula-tions of the model [46] and the ηi scalings showed the maximum temperature

gradient that the plasma can sustain before ITG instabilities begin to grow. It should be noted that the ηi scaling calculated here showed roughly 20%

smaller ITG growth rates.

The simulations were run with different random waves as initial condi-tions and the ITG growth rate varied between run. The growth rates for the Cyclone base case with differing initial conditions were documented as γITG ∼ 0.34 − 0.45. The lower values of the growth rates then coincide with

experimental values. To verify the results the residuals were monitored and compared with the larger terms in the model.

The critical ηic is small compared to that of fully kinetic models. The

upward shift in ηic is called the Dimits shift, and is not included in this

version of the Weiland model since it does not include an explicit zonal flow. There are modified versions of the Weiland model that includes more physics, such as zonal flows, the trapped electron mode, and parallel dynamics etc.

The AA method failed to converge for the Weiland model, however the Q-SIR method with an approximate initial Jacobian succeeded in converging.

(27)

(a) ITG growth rate with varying

ηi and n parameters (b) Cyclone base case t = 20

Figure 8: GWRM solution of linear+FLR Weiland model τ = 1.0, = 0.909, and ηi = 3.14.

GWRM parameters: Nx= Ny= 3, K = 5, and K = L = 10.

The Q-SIR iterations in each time step can be seen in Figure 8b. The GWRM with Q-SIR managed to reach time-interval lengths between ∆t = 0.3 − 0.5. For the Cyclone base case the solution is smooth in the temporal domain, as well as the spatial domain, thus a larger time-interval length can be achieved.

Figure 9: The number of Q-SIR iterations (right ) and time interval lengths (left ) at each time interval. The GWRM parameters; Nx= Ny= 3, K = 5, and L = M = 10, Cyclone

base case parameters; τ = 1, n= 0.909, and ηi= 3.14. The root solver convergence was

computed with the absolute norm ||f (xk)||2.

The nonlinear case features inverse and direct energy cascades. This means that small and long wavelengths need to be damped in order to reach a

(28)

saturated turbulent state. This has been verified by the GWRM simulations, however attempts at damping the direct energy cascade to longer wavelengths has been proven unsuccessful.

7.4. GPU-accelerated modules

The simulations of the 1D and 2D Chebyshev product algorithms were computed on the Tegnr system (located at KTH Royal Institute of Tech-nology) with a Tesla K80 GPU and an Intel(R) Xeon(R) CPU E5-2623 v3 @ 3.00GHz. The monolithic kernel and the stride kernel have been bench-marked with increasing mode values N and M and then compared with a CPU code with openMP instructions. The time it takes to send the data to the GPU was not included in the computation times here since the data structures were small, however for larger data structures it should be included for a proper comparison. All of the codes were compiled with the -O3 opti-mization flag. In Figure 10 the speed up gains (ratio of GPU computation time and CPU time) can be seen.

(a) N Chebyshev modes. (b) N=M Chebyshev modes.

Figure 10: GPU acceleration (Parallel CPU) of a 1D (a) and 2D (b) Chebyshev series product algorithm (Tesla K80). The two bars represent a naive monolithic kernel (red ) and an optimized striding kernel (green).

In Figure 10a the speed up gains of the 1D Chebyshev product is pre-sented. For a moderate amount of Chebyshev mode numbers both GPU kernels and the CPU function perform similarly. This is expected since the amount of operations do no justify sending the data to the GPU when only a few threads will be launched. The GPU kernels do however scale linearly in terms of speed up gains for the 1D case, so for higher modes numbers N > 200 the GPUs outperform the parallel CPU by several factors.

(29)

Table 1: 2D Parallel CPU vs GPU kernels Name Chebyshev modes(N=M) Time (ms) CPU 100 170 Mon. 100 0.044 Str. 100 0.004 CPU 200 1093 Mon. 200 0.048 Str. 200 0.005

For the 2D case the speed up gains can be seen in Figure 10b. Here the stride kernel outperforms the CPU in all tests. The monolithic kernel also outperforms the CPU, but higher mode numbers are required N = M > 20. In these tests the stride kernel speed up gain scales quadratically, and reaches speed up gains over several thousands, reaching the highest speed up gains > 175, 000 for mode numbers N = M > 160.

The computation times [ms] for the 2D case (N = M = 100 and N = M = 200) can be seen in Table 1. The speed-up gains rapidly reach factors over 1000 for the 2D case. The 1D case showed speed-up values up to a factor 10, however the number of Chebyshev modes were increased over 500 in order to reach these values. The highest speed-up gain over 20 thousand was documented for the 2D case with K = L = 180 Chebyshev modes. A 2D ansatz with K = L = 180 Chebyshev modes amounts to a 3D ansatz with K = L = M = 30 Chebyshev modes, and 4D K = 10 and L = M = N = 15. Thus, the speed-up gains for higher dimensions can reach orders of several thousands if the same level of parallelization and GPU saturation can be achieved.

8. Discussion

All applications presented here, namely compressible Navier-Stokes, ideal MHD, and the Weiland model were run without any digital filter or spatial adaptive-mesh during the simulation. This was done in order to establish a baseline performance of the 2D-GWRM for tests with high and low spatial accuracy requirements. Here it was of interest to see how the root solvers and the time-interval lengths were affected for each test case. Another point of interest would be to see if a substantial time-interval length can be kept

(30)

under poor spatial resolution conditions, which can then can be remedied post-process with a filter of choice.

For the compressible Navier-Stokes test case Gibbs oscillations came to quickly dominate the solution, however the time-interval lengths were roughly the same throughout the period of decreasing spatial resolution. The oppo-site was seen in the ideal MHD case where the time-interval length decreased throughout the simulation. The difference between the two can be explained by the fact that the OTV features shock waves much steeper than the gradi-ents in the KH instability. Thus, in the KH instability case the root solver set a limit on the time-interval length and in the OTV test the shocks degraded the spatial/temporal accuracy.

For simulations with higher degrees of accuracy the Anderson Accelera-tion method is a necessary choice due to memory requirements. For lower resolutions tests of the drift-wave turbulence and KH instability the Q-SIR method showed better converge rates compared to the AA method. However, this was not the case for the OTV test. Although it should be noted that the Q-SIR convergence rate was achieved with an approximate initial Jacobian. The initial Jacobian can be approximated in many ways, e.g. with finite difference, or simply postulating an identity matrix as the first Jacobian iter-ate. The choices will affect the convergence rate significantly. When applying the Q-SIR root solver to the algebraic GWRM equations the Jacobian was initialized with boundary conditions only. Since boundary conditions are linear the derivatives can be efficiently computed. The consequence of in-cluding the boundary conditions in the Jacobian is improved convergence rate. It should also be noted that the Q-SIR method can benefit in regards to memory consumption by using recursive algorithms developed for Limited Memory Quasi-Newton methods. However, how the boundary conditions can be included in the recursive limited memory algorithms is unknown.

The root solver tests with variable relaxation parameters, β for AA method and R for Q-SIR, showed that both had little to no increase in global con-verge. For all cases both the AA and Q-SIR method had increased converge rates with lower relaxation parameters. This showed that the relaxation parameters had a negative effect on the time-interval lengths since they de-creased the converge rates of the root solver.

The test of Chebyshev module GPU acceleration has shown that the product module, being the most computationally expensive, can achieve high levels of speedup gains. This can also be expected for the integral and derivative modules since they are more amenable to parallelization than

(31)

the product module. It should be noted that the GPU-accelerated mod-ules have note been implemented in the GWRM Maple code. However, the GPU-accelerated test here seeks to address the issue of the computationally intensive Chebyshev product.

Both the monolithic and stride kernel thread strategies performed better than the parallel CPU for higher numbers of Chebyshev modes. The speedup gains increased when going from 1D to 2D, which should continue when going up to 3D and 4D, or until the GPU threads are saturated (i.e. all threads are deployed). Once all the threads in one GPU is deployed, multiple GPUs can work in parallel to further increase the speedup gains.

As we have here shown, the steep gradients have a similar, if not the same, effect on spectral methods as they do on implicit time-marching methods. Both the time-interval length and the root solver convergence rate is affected, notwithstanding the degradation of boundary conditions. Here are some possible solutions,

• Reformulate the PDEs to evolve spatially- and temporally-averaged variables in non-smooth regions. This could potentially relax the re-gion of steep gradients and thus allow for large time-intervals. The exact solution can then be recovered in post-process from the averaged variables.

• Introduce h-p-adaptive mesh routines or hybrid Weighted Essential Os-cillatory (WENO) schemes. This will resolve the shocks, however the time-interval lengths will likely have to decrease as to accommodate the moving mesh and physical dynamics.

• As has been stated by other authors, pseudo-spectral methods could benefit from digital filters during the simulation so as to accelerate the method. This can be applied to the temporal domain as well if a time-pseudo-spectral method is developed. The difficulty of applying a digital filter to a purely spectral method is that the spectral coefficients are not amenable to noise filtering as of yet. The initial conditions in space can however be filtered with minimal computational cost in be-tween time steps, although this would require a spectral-space-spectral transformation in each time step.

• Introduce a prism mesh to the 2D-GWRM, i.e. all dimensions (t, x, y) have a non-uniform mesh. The difficulty would be to analytically de-rive Chebyshev modules to include the transformation. The benefit is

(32)

that the implementation of the initial and boundary conditions will not change, and interpolation of initial conditions between time-intervals will not be necessary. The idea of using a prism mesh has been imple-mented for the space-time pseudo-spectral DG method in [23]. If the Chebyshev modules can be reformulated for a prism mesh it would be of interest to compare the two methods.

• Implement root solvers that are more suitable for fast convergence and low memory requirements for large scale nonlinear problems. Precondi-tioning would most likely benefit the spectral method as it does implicit time-marching methods.

• The convergence rate of the Q-SIR method was greatly increased when boundary conditions were included in the initial Jacobian. This could perhaps also benefit the AA method if the initial minimization prob-lem includes information of the boundary conditions. Furthermore, the Q-SIR method did managed to converge in certain cases with the boundary condition Jacobian without any inverse update. The R factor also improved the global stability of Q-SIR when no inverse updates were included.

9. Conclusion

The Time-spectral method has been applied to three test cases; com-pressible Navier-Stokes, ideal MHD, and an advanced fluid model for plasma turbulence called the Weiland model. The GWRM managed to compute the correct solutions of the Kelvin-Helmholtz instability and Orszag-Tang vor-tex with high temporal accuracy, however Gibbs oscillations degraded the maximum time-interval lengths of the method. The Gibbs oscillations were successfully removed and the steep gradients resolved with a digital filter, namely the digital isotropic nuclear variation filter. The digital filter does not recover spectral accuracy, however it does improve the overall resolution, especially in regions with steep gradients.

The test-cases were performed on a uniform grid with two different root solvers. The root solvers chosen here for comparison were the Anderson Ac-celeration method and a newly developed Quasi Semi-implicit root solver (Q-SIR). The Q-SIR method showed a superior convergence rate (roughly 85% for CNS) and global properties, although the memory consumption of the Q-SIR does not scale favorably for large scale simulations. Thus, the Anderson

(33)

Acceleration method was chosen for the Kelvin-Helmholtz and Orszag-Tang vortex cases. The AA method is a suitable root solver for the ideal MHD case, however it did not perform as well for the compressible Navier-Stokes and Weiland model.

For the Ion-Temperature Gradient (ITG) simulation with the Weiland model the GWRM achieved much longer time-interval lengths due to the smoothness of the solution. This makes the GWRM suitable for linear growth rate simulations of a fusion plasma since the initial short wavelength modes can be efficiently averaged in order to calculate the maximum growth rate of the ITG mode. The GWRM was also able to reproduce the parameter scalings of the Weiland model for ITG modes. Including more physics such as zonal flows and Landau damping is required in order to achieve a saturated nonlinear state.

The spectral Chebyshev product has been efficiently parallelized with CPUs and GPUs. This was done for the 1D and 2D product module. A parallel CPU and two GPU thread strategies have been compared, where the GPU-accelerated modules achieved up to a 20 thousand speedup gain. Thus, the Tim-spectral method has several avenues of parallelization, namely the floating point operations in each Chebyshev module (differentiation, integral, product), and in each individual subdomain. This allows the GWRM to scale favorably on high-performance computers.

To summarize, the GWRM is a highly accurate temporal-spatial numer-ical method in regions of smooth solutions. Here the GWRM is documented with roughly 2.5 − 70 times larger time-intervals than an explicit time-step, the lower values documented with solutions containing steep gradients. This makes the method especially apt as a general purpose numerical method, and also as part of a hybrid scheme where a majority of the solution contains smooth solutions. The GWRM can increase the spatial accuracy without degrading the time-interval length. Thus, when the spatial accuracy is suf-ficient and the root solver convergences the temporal domain computations increase in efficiency.

(34)

Appendix A

Compressible Navier-Stokes (q = 1/ρ): Kevin-Helmholtz equations ∂q ∂t = −vx ∂q ∂x − vy ∂q ∂y + q ∂vx ∂x + q ∂vy ∂y (48) ∂vx ∂t = −vx ∂vx ∂x − vy ∂vx ∂y − q ∂p ∂x + κ ∂2v_x ∂x2 + ∂2_v x ∂y2 (49) ∂vy ∂t = −vx ∂vy ∂x − vy ∂vy ∂y − q ∂p ∂y + κ ∂2v_y ∂x2 + ∂2_v y ∂y2 (50) ∂p ∂t = −vx ∂p ∂x − vy ∂p ∂y − γp ∂v_x ∂x + ∂vy ∂y (51) Appendix B

Ideal MHD : Orszag-tang vortex equations ∂q ∂t = −vx ∂q ∂x − vy ∂q ∂y + q ∂vx ∂x + q ∂vy ∂y (52) ∂vx ∂t = −vx ∂vx ∂x − vy ∂vx ∂y − q ∂p ∂x − qBy ∂By ∂x + qBy ∂Bx ∂y (53) ∂vy ∂t = −vx ∂vy ∂x − vy ∂vy ∂y − q ∂p ∂y + qBx ∂By ∂x − qBx ∂By ∂y (54) ∂Bx ∂t = −Bx ∂vy ∂y + By ∂vx ∂x − vx ∂Bx ∂x − vy ∂Bx ∂y − ∂ψ ∂x (55) ∂By ∂t = −By ∂vx ∂x + Bx ∂vy ∂x − vx ∂By ∂x − vy ∂By ∂y − ∂ψ ∂y (56) ∂p ∂t = −vx ∂p ∂x − vy ∂p ∂y − γp ∂v_x ∂x + ∂vy ∂y (57) ∂ψ ∂t = − c2 h c2 P ψ − c2_h ∂B_x ∂x + ∂By ∂y (58) Appendix C

(35)

χ = ∇2_δT i. ∂δTi ∂t = 2n τ + 2n 3τ2 − ηi τ ! ∂φ ∂y + 7n 3τ ! ∂δTi ∂y + 2n 3τ2 + 2n 3τ − 2 3τ2 − 2 3τ − 2ηi 3τ2 ! ∂ψ ∂y + 2n 3τ ∂χ ∂y + ∂φ ∂y ∂δTi ∂x − ∂φ ∂x ∂δTi ∂y − 2 2τ + 2 τ2 ! ∂φ ∂y ∂ψ ∂x + 2 3τ2 − 2 τ ! ∂φ ∂x ∂ψ ∂y + 2 3τ ∂δTi ∂x ∂ψ ∂y − ∂δTi ∂y ∂ψ ∂x ! (59) ∂ ∂t(1 − ∇ 2_{)φ =} n τ + − 1 ! ∂φ ∂y + n ∂δTi ∂y − 1 τ + ηi τ ! ∂ψ ∂y − ∂φ ∂y ∂ψ ∂x + ∂φ ∂x ∂ψ ∂y − 2 3τ + 2 3τ2 ∂φ ∂y ∂ψ ∂x + 2 3τ + 2 3τ2 ! ∂φ ∂x ∂ψ ∂y ∂δTi ∂x ∂ψ ∂y − ∂δTi ∂y ∂ψ ∂x (60) References

[1] Gander, M., Kwok, F., Numerical analysis of partial differential equations using maple and matlab (fundamentals of algorithms; FA12), Soc. Ind. Appl. Math., 2018.

[2] Gottlieb, D., Orszag, S.T, Numerical analysis of spectral methods: the-ory and applications, Soc. Ind. Appl. Math., Philadelphia, Pennsylvania, 1977.

[3] Canuto, C., Quarteroni, A., Hussaini, M.Y., Zang, T.A., Spectral methods in fluid dynamics, Springer-Verlag, 1988.

[4] Kopriva, D., Implementing spectral methods for partial differ-ential equations, Springer, Dordrecht, 2009, doi:https://doi-org.focus.lib.kth.se/10.1007/978-90-481-2261-5.

[5] Costa, B., Don, W.S., Multi-domain hybrid spectral-weno methods for hyperbolic conservation laws, J. Comput. Phys., 224, 2007.

(36)

[6] Shu, C.-W., High order weighted essentially non-oscillatory schemes for convection dominated problems, SIAM Rev., 51, 2009.

[7] Cockburn, B., Shu, C.-W., TVD rungekutta local projection discontinuous galerkin finite element method for conservation laws II: general frame-work, Math. Comput., 52, 1989.

[8] Cai, W., Gottlieb, D., Shu, C.-W., Essentially non-oscillatory spectral fourier methods for shock wave calculations, Math. Comput., 52, 389-410, 1989, doi:https://doi.org/10.1016/j.asej.2014.10.021.

[9] Tohidi, E., Application of chebyshev collocation method for solving two classes of non-classical parabolic PDEs, Ain Shams Engineering Journal, 6, 373-379, 2015, doi:https://doi.org/10.1016/j.asej.2014.10.021.

[10] Rasetarinera, P., Hussaini, M.Y., An efficient implicit discontinu-ous spectral galerkin Method, J. Comput. Phys., 172, 718738, 2001, doi:10.1006/jcph.2001.6853.

[11] Kurihara, Y., On the use of implicit and iterative methods for the time integration of the wave equation, Mon. Wea. Rev., 93, 1965.

[12] Robert, A., Henderson, J., Turnbull, C., An implicit time integration scheme for baroclinic models of the atmosphere, Mon. Wea. Rev., 100 1972.

[13] Fezoui, L., A class of implicit upwind schemes for euler simulations with unstructured meshes, J. Comp. Phys., 84, 1989.

[14] Tal-Ezer, H., Spectral methods in time for hyperbolic equations, SIAM J. Numer. Anal., 23, 1986.

[15] Tal-Ezer, H., Spectral methods in time for parabolic equations, SIAM J. Numer. Anal., 26, 1989.

[16] Luo, Y. Polynomial time-marching for three-dimensional wave equa-tions, J. Sci. Comp., 12, 1997.

[17] Delic, G., Spectral function methods for nonlinear diffusion equations, J. Math. Phys., 28, 1987.

(37)

[18] Dutt, P., Spectral methods for initial boundary value problems–an alter-native approach, SIAM J. Numer. Anal., 27, 1990.

[19] Bar-Yoseph, P., Zrahia, U., Space-time spectral element method for solu-tion of second-order hyperbolic equasolu-tions, Comput. Methods Appl. Mech. Engrg., 116, 1994.

[20] Tangand, J.-G., Ma, H.-P., A legendre spectral method in time for first-order hyperbolic equations, Appl. Num. Math., 57, 2007.

[21] Yang, W., Liu, F., Turner, I., Stability and convergence of an effective numerical method for the time-space fractional Fokker-Planck equation with a nonlinear source Term, Int. J. Diff. Eq., 2010, 2009.

[22] Wang, L., Persson, P.-O., A high-order discontinuous Galerkin method with unstructured spacetime meshes for two-dimensional compressible flows on domains with large deformations, Comp. Fluids, 118, 2015. [23] Cangiani, A., Dong, Z., Georgoulis, E.H., hp-Version space-time

discon-tinuous Galerkin methods for parabolic problems on prismatic meshes, SIAM J. Sci. Comput., 39, 2017.

[24] Pei, C., Sussman, M., Hussaini, M.Y., A space-time discontinuous Galerkin spectral element method for nonlinear hyperbolic problems, Int. J. Comp. Meth., 16, 2019.

[25] Kirk, K.L.A., Horvath, T., Cesmelioglu, A., Rhebergen, S. , Analy-sis of a space-time hybridizable discontinuous Galerkin method for the advection-diffusion problem on time-dependent domains, SIAM J. Num. Anal., 57, 2019.

[26] Scheffel, J., Time-spectral solution of initial-value problems, In C.L. Jang (Ed.), Partial Differential Equations, Nova Science Publishers, Inc. 2011. [27] Scheffel, J., A spectral method in time for initial-value problems, Amer.

J. Comput. Math., 2, 173-193, 2012, doi:10.4236/ajcm.2012.23023. [28] Riva, F., Milanese, L., Ricci, P., Uncertainty propagation by using

spec-tral methods: a practical application to a two-dimensional turbulence fluid model, Phys. of Plasmas, 24, 2017, doi:10.1063/1.4996445.

(38)

[29] Dedner, A., Kemm, F., Krner, D., Munz, C.-D., Schnitzer, T., We-senberg, M., Hyperbolic divergence cleaning for the mhd equations, J. Comput. Phys., 175, 645-673, 2002, doi:10.1006/jcph.2001.6961.

[30] Scheffel, J., Hkansson, C., Solution of systems of nonlinear equations a semi-implicit approach, Appl. Numer. Math., 59, 2009.

[31] Byrd, R.H., Liu, D.C., Nocedal, J., On the behavior of broyden’s class of quasi-newton methods, SIAM J. Optim., 2, 1992.

[32] Broyden, C.G., Dennis, J.E., Mor, J.J., On the local and superlinear convergence of quasi-newton methods, IMA J. Appl. Math., 12, 1973. [33] Broyden, C.G., On the discovery of the ”good broyden” method, Math.

Program., Ser., Ser. B 87, 2000.

[34] Huang, W., Gallivan, K.A., Absil, P.-A., A broyden class of quasi-newton methods for riemannian optimization, SIAM J. Optim., 25, 2015. [35] Anderson, D.G., Iterative procedures for nonlinear integral equations, J.

ACM, 12, 1965, doi:https://doi.org/10.1145/321296.321305.

[36] Walker, H., Ni, P., Anderson acceleration for fixed-point iterations, SIAM. J. Numer. Anal., 49, 2011.

[37] Chan, T.F., Osher, S., Shen, J., The digital TV filter and nonlinear denoising, IEEE Trans. Image Proc., 10, 2001.

[38] Sarra, S.A., Digital total variation filtering as postprocessing for radial basis function approximation methods, Comp. Math. Applic., 52, 2006. [39] Sarra, S.A., Digital total variation filtering as postprocessing for

cheby-shev pseudospectral methods for conservation laws, Num. Alg., 41, 2006. [40] Meister, A., Ortleb, S., Sonar, Th., Application of spectral filtering to

discontinuous galerkin methods on triangulations, Numer. Methods. Par-tial Differ. Equ., 26, 2012.

[41] Orszag, S.A., Tang, C.-M., Small-scale structure of two-dimensional magnetohydrodynamic turbulence, J. Fluid Mech., 90, 1977.

(39)

[42] Ma, H., Chebyshev-legendre super spectral viscosity method for nonlinear conservation laws, SIAM J. Numer. Anal, 35, 1998.

[43] Sarra, S.A., Chebyshev super spectral viscosity solution of a two-dimensional fluidized bed model, Int. J. Numer. Methods Fluids., 4, 2002. [44] Chen, H., Liu, H., Chen, J., Wu, L., Chebyshev super spectral viscosity

method for water hammer analysis, Prop. Power Res., 2, 2013.

[45] Henderson, N.C., Varadhan, R., Damped anderson acceleration with restarts and monotonicity control for accelerating EM and EM-like al-gorithms, J. Comput. Graph. Stat., 28, 2019.

[46] Weiland, J., Stability and transport in magnetic confinement systems, Springer, New York, 2012.

[47] Nordman, H., Pavlenko, V., Weiland, J., Subcritical reactive drift wave turbulence, Phys. Fluids, 5, 1993.

[48] Weiland, J., Collective modes in inhomogeneous plasma: kinetic and advanced fluid theory, CRC Press. 1999.

[49] Nordman, H., Weiland, J., Jarmn, A., Simulation of toroidal drift mode turbulence driven by temperature gradients and electron trapping, Nucl. Fusion, 30, 1990.

[50] Weiland, J., Nordman, H., Drift wave model for inward energy transport in tokamak plasmas, Physics of Fluids B: Plasma Physics, 5, 1993.