Sorting Software - Software tools to train and apply deep neural nets for sorting

3. Results

3.4. Software tools to train and apply deep neural nets for sorting

3.4.2. Sorting Software

The software to control sorting was written by Martin Nötzel. For the sake of completeness, I want to describe here the parts, which are important for image-based sorting using DNNs. The software is written in C++ and OpenCV 68_{is used for image} processing. Frames are continuously retrieved from the high-speed camera at 3000 frames per second. A background image is obtained by computing a rolling average of the last 100 frames, which is subtracted from each subsequent frame. After background subtraction, thresholding is applied to binarize the image. Next, dilation and eroding operations are applied to finally obtain smooth contours using a contour finding algorithm 69_{. Optionally, computation is stopped at this point if the number of found} contours exceeds a defined number. For sorting, it is advantageous to have only a single object in the region of interest to avoid sorting cells which are in the proximity of a target cell (see Figure 3.31). Furthermore, the time difference between the current and last captured event is tracked which can be used to prohibit sorting when events appeared too short after each other. Using this option one can avoid accidental sorting of multiple cells, which is especially helpful when working with samples that tend to form clusters (such as retina).

The contour of each object is used to compute the bounding box, returning the length, height and aspect ratio of the object. Computation is only continued if the object meets certain user defined criteria (min. and max. length, height and aspect ratio), which allows to stop computation for example when debris (small) or a red blood cell (high aspect ratio) is captured. Next, the original image (not the binary one) is cropped to a user defined region around the middle of the bounding box. The cropped image shows

114

the object in the middle and has dimensions matching the requirements of the chosen neural net. The C++ library keras2cpp 161_{is leveraged to forward the image through the} trained neural net, which returns the probability for each class. A sorting pulse is triggered when the probability exceeds a defined threshold for a defined class. To decrease the computational time, saving of images and parameters is omitted entirely during image-based sorting.

Figure 3.31 Preventing accidental sorting of multiple cells

Sorting pulses are omitted if multiple contours are detected within the region of interest. Scale bar: 20µm.

3.4.3. Discussion

Before image-based sorting can be performed, a suitable DNN needs to be trained which can be a substantial challenge, as programming skills are required. The most popular programming language for deep learning is Python and a large open source community drives a rapid software development, resulting in quickly improving software libraries, but also in a risk of losing compatibility to older code and a need of continuous software maintenance. Therefore, I developed AID, a software with graphical user interface, which allows to perform all required steps to obtain a model that can be used for soRT-FDC. Since AID was embedded into a standalone executable, it runs identically on each Windows 7 and 10 PC, allowing for reproducible analyses. AID provides several DNN architectures (including MLP1 and MLP2) which can be extended and customized. Methods for image augmentation are implemented and their effect is visualized by example images. Hyperparameters such as image augmentation parameters can be changed during the training process and the effect on metrics such as accuracy and validation accuracy is immediately displayed by interactive plots. AID eases the applicability of DNNs for image-based sorting using soRT-FDC because it

115 provides tools to convert final models to a format that is accepted by the Sorting Software.

Other tools for image processing using machine learning are “Zen Intellesis” from Zeiss and ilastik 162_{, but both programs use fixed (non-trainable) DNNs for feature extraction} and only support training of Random forests for image segmentation. The more popular image analysis tools Fiji 163_{and ImageJ}152_{offer only very limited support for} neural nets. DIGITS™ from NVIDIA® provides a GUI and allows to train DNNs, but only very few DNNs are available, the input dimensions are fixed and access to optimization of hyperparameters is limited. The most complete GUI based program I found is “Deep learning studio” (DLS) from DeepCognition, which provides a flexible solution, allowing to train DNN models for many use cases (e.g. image classification, image segmentation, and natural language processing). In contrast, AID is just optimized for image classification due to its intention for image-based sorting. Unfortunately, DLS does not provide solutions to handle unbalanced datasets, which are quite common in biology (for example red blood cells are orders of magnitudes more abundant in blood than white blood cells). DLS supports training of models on certain GPUs which is currently not supported by AID (at least not in the standalone executable). Despite limitation to CPU power, training of models in AID is more than sufficiently fast, for example training MLP1 and MLP2 for a single iteration using 200,000 images (grayscale, 32x32 pixels) takes 3.6s and 4.4s, respectively (on an Intel®_{Core™ i7-4810MQ @ 2.80 GHz). More} computational time (9s) is actually spent to perform affine image augmentations (random rotation, shift, zoom, shear and flip). Therefore, GPU support only becomes attractive when training of larger neural nets is proposed, but those are not yet of interest for soRT-FDC due to long inference times.

In conclusion, AID helps to accelerate and standardize the process of DNN training. Hence, the time between the first RT-FDC measurement of a sample and the image- based sorting of such a sample using soRT-FDC is shortened. AID allows everyone to use deep learning methods and train DNNs, which extends accessibility of DNN based image analysis and image-based sorting also to non-programmers. The combination of AID and basic MLP architectures allows for very fast training of models. Measuring a

116

sample, training a neural net, and sorting using that neural net, is a routine that can be conducted by one person within a single day. To show that, the next section presents two examples of sorting experiments, where AID was used to perform training of the DNN.

117

In document Real-time image-based cell identification (Page 123-127)