Generalization by Randomness - Training of a Neural Network

6.2 Training of a Neural Network

6.2.3 Generalization by Randomness

The trainable parameters γ, β are adjusted per batch while the constant ϵ guarantees numerical stability.

6.2.3 Generalization by Randomness

Randomness is a key aspect of the generalization capabilities of DNNs. This can be identified in a variety of strategies from which some important ones will be explained in this section to underline the general understanding of learned generalization versus memorization during training, i.e. overfitting.

For a DNN with unchanged architecture, two trainings will not happen in exactly the same way. One reason for this is how the data handling routines provide the training and validation data during training. Typically, a batch of predefined size is randomly drawn from the data collection [Bro16a].

Furthermore, randomness is introduced during initialization of the trainable parameters, i.e. the weights. If every weight on the neurons’ connections in the DNN would be same, all neurons computed the same output. This symmetry of all neurons in the layers leads to the same gradients during backpropagation and therefore all parameters undergo the exact same updates. In other words, there is no source of asymmetry between neurons if their weights are initialized to exactly the same values. Therefore, the weights have to be very close to zero to not emphasize any input right from the beginning, but not identically zero. As a solution, it is common to initialize the weights of the neurons to small numbers and refer to doing so as symmetry breaking, as such, a random state of the total DNN is established at first [Joh17; Bro16a].

The idea of generalization in supervised learning is inevitable connected to the bias-variance dilemma describing the trade-off between accurate modeling and generalization [Bro16b]. Since natural collections of training data contain fluctuations, a non-zero variance always exists. A low or high bias counteracts a high or low variance, hence biases are essential trainable parameters in DNNs [Bro16b]. A bias

6.2 Training of a Neural Network as introduced in Equation 6.6 adds an additional input to a neuron and allows its activation even in case of zero input on all other connections and makes it easier for neurons to saturate [Nie15]. Its noise-like influence is almost complementary to dropout and fosters generalization [Bro16b]. During training the optimizer ad- justs the bias’ values to influence the model complexity. To avoid assumptions on the model before training, biases may be initialized by zero, e.g. default in Keras [Cho+15]. An account on the bias-variance dilemma is given in Appendix A.3 for further reading.

With regard to the aforementioned, a remark on reproducibility has to be made. Seeking for generalization by the mentioned strategies invoking randomness intu- itively prevents reproducibility. However, reproducibility is an important concept to evaluate and proof scientific work. In order to reduce variance in training when reproducibility is to be tested, the generators used for (pseudo-)random initialization are fed with identical seed values.

6 Neural Networks and Deep Learning

7 Materials and Methods: Neural Networks

for SMS Reconstruction

The following chapter moves the focus from the general introduction of NNs and DNNs to the materials, in form of software tools and data base, and methods, i.e. the developed DNN architectures and conducted experiments, used throughout this work.

Relation of SMS Reconstruction Methods to SMSnet: The efforts to accelerate MRI measurements lie mainly in three areas. These are: The signal processing for reconstruction, the scanner hardware and the physics behind an imaging sequence. As introduced in Chapter 3, established reconstruction methods for accelerated MRI commonly involve the spatial information inherent to the MR machine itself, i.e. from multi coil receiver arrays, to recover omitted data after the measurement [Bla+04; Lar+01]. Two main strategies have been established and coexist for the past 20 years. These are image domain based approaches such as SENSE [Pru+99] which utilize the coil sensitivities explicitly and methods such as simultaneous acquisition of spatial harmonics (SMASH) [SM97] or GRAPPA [Gri+02] which extract the missing information implicitly from k-space correlations in the coil elements [Uec+14]. Although, these methods share the same principles, sub-types of these strategies have been adapted to the different problems. Specific tasks are, for example, the recovery of in-plane undersampled data, typically along PE direction, or the reconstruction with an undersampling along the slice direction in SMS, but also a combination of both. While SENSE- and GRAPPA-like techniques basically solve an ill-conditioned linear system by numerical inversion other approaches accomplish reconstruction by iterative nonlinear inversion techniques, e.g. regularized nonlinear inversion (NLINV) for PI [Uec+08] and SMS-NLINV in case of SMS [Ros+17]. Both, linear and non-linear techniques, use reference data to estimate the coil sensitivities even though their density distribution (along k-space trajectories), point of acquisition and processing in the reconstruction differ.

A completely novel approach is presented here [WE16]. Instead of solving the problem of overlapping image content by a deterministic algorithm, a trained DNN recovers the separated slices. A new network architecture referred to as SMSnet was developed for this task. Generally speaking, SMSnet is designed and trained in such way that it learns information of the imaging system as one unit incorporating the hardware, i.e. the MR machine, and the imaging processing. The properties learned are, in case of SMS, probably best associated with the CoSs. The dielectric properties of the object under investigation directly influence the CoS [Uec+08],

7 Materials and Methods: SMSnet

therefore, a conventional calculation of an universal profile can not be done a priori. Even for objects which are often positioned similarly in an identical receiver array, e.g. the head coil, individual reference data are required to explore the CoSs and to compensate for these changes [Uec+08].

Here, a DNN interprets CoSs as a feature to be extracted. The generated output masks are merged with the preprocessed SMS image data in an extended channel domain. Both, the path to generate the masks and the path to handle the image data, receive only SMS data. In particular, no SB or ACS data, neither in image domain nor in k-space domain, are provided to the DNN. SMSnet learns an input-output mapping during supervised training, therefore pairs of source (in k-space and image domain) and target data (image domain) are available at the network’s inputs and output.

7.1 Software and Architecture

The design of a NN is often motivated by existing architectures for related tasks or classical algorithms for the governing question. A combination of both approaches inspired the model designed in this work. The final architecture of SMSnet was mainly inspired by deterministic PI and SMS reconstruction methods.

The python based deep learning library, Keras, was used [Cho+15]. Its layer-based organization offers versatile types of layers and various derivatives of the most common types. Keras’ high-level application programing interface (API) allows rapid development and prototyping of NNs and can be run with a TensorFlow backend. Layer parameters, such as type of the activation function or dimensionality, as well as hyper-parameters for training are specified during model definition. The compiled model can be saved for archiving or re-training. Jupyter Notebook was employed for programming and to run trainings and save evaluations [Jup18].

At this point, relevant types of layers will be introduced. The developed architecture will be described later on in Section 7.2.

• A fully connected layer connects all k inputs xn_{= {x}n

1, ..., xnk} of the n-th layer

to all l activation functions which results are passed to all l neurons in the n + 1-th layer (Fig. 6.3). For the 2D case, as typical for images, the constructed tensors have four axes: [Nb, Nx, Ny, Nc], where the Nb images, i.e. the batch

size, of size Nx× Ny are loaded simultaneously to the GPU. Nc specifies the

number of channels, or features. Keras allows a fully connection limited to the channels axis only, resulting in Nn

c × Ncn+1 connections as illustrated in Figure

7.1. The tensor’s shape of the first three axes remains unchanged for these type of fully connected layers.

• NNs containing 2D convolutional layers are favored for many tasks involving

7.1 Software and Architecture images and often called convolutional NNs or, if all connections in the NN are convolutions, fully convolutional NNs, respectively. The key advantages why convolutional NNs perform well on (natural) signals are: their sensitivity to local connections, i.e. patches of an image, and shared weights by the receptive field, their pooling properties by which neighboring inputs are fed as a pooled input to the next layer and the reduced number of trainable parameters which allow deep network architectures [LYH15]. The detection of image features such as edges is not locally limited as the convolution kernel is slid through the complete plane of the input as shown in Figure 7.2. This shift invariance is another important property of convolutional layers in the domain of image processing.

• BN layers in Keras allow the normalization of the previous layer’s output. A common choice to apply BN on a four dimensional tensor is to normalize for mean and variance in first three dimensions separately for each channel (Eq. 6.24).

• A dropout layer randomly drops a specified rate rdrop = [0, 1] of all input

connections and passes the remaining connections without any modification to the next layer.

• A 2D upscaling layer repeats the rows and columns, i.e. pixels in Nx, Ny, of

the data as defined by the size parameter. E.g., a size of (2, 2) doubles the extend of the input along both axes.

Figure 7.1: A fully connected layer can maintain its 2D shape in Keras. The fully connections are between channels of successive layers (layer n and layer n + 1) as illustrated. No x-y specific weights are involved.

7 Materials and Methods: SMSnet

Figure 7.2: The receptive field is similar to a convolution kernel, here 2D, which is slid through the input. Pixel values at the input pass through weighted connections which accumulate in an activation function. Sliding of the receptive field in layer n indicated by arrows, yields an output pixel at a corresponding position in layer n + 1 also indicated by arrows. The results of different kernels, i.e. kernels with different trainable weights, are dumped along the channel dimension in the output. The number of channels is also called filter size or filters.

In document New Approaches to Simultaneous Multislice Magnetic Resonance Imaging : Sequence Optimization and Deep Learning based Image Reconstruction (Page 102-108)