Segmentation of video frames into binary images is the first step of the tracking system and is used to extract measurements from image frames. Segmentation, when used, is a very important stage in image processing as it determines which features can be extracted from the image, impacting later stages of processing. When tracking objects using video, good segmentation produces better object measurements which can then be fed into the tracking system. The importance of segmentation is further highlighted in Section 5.6 where the accuracy of the system is investigated.
Five segmentation methods are evaluated: global thresholding, adaptive (lo- cal) thresholding, edge detection, background subtraction by computing median over time, background subtraction by using mean background estimation. The first three methods work on individual frames while background subtraction methods require a sequence of frames.
5.2.1
Global Thresholding
Global thresholding used a histogram equalisation procedure followed by thresh- olding using Otsu’s (Otsu, 1979) method as explained in Subsection 3.2.2. The histogram was created using 64 bins. It was noticed that the default thresh- old provided by Otsu’s method was inadequate and it was multiplied by 0.25 (determined empirically) to produce better results. The requirement for such significant adjustment may be due to the fact that Otsu’s method works best on bimodal histograms and underwater images used during this research were in general not bimodal.
The method had the advantage of simplicity and hence the processing speed, and it was observed that the average speed of processing was 35 fps. It produced acceptable results when the density of fish was low by detecting 5-15 sample fish per minute. This number was the lowest of the methods tried and this method often could not detect whole shapes of fish. The computational cost was low which made this method suitable for real-time deployment.
Visual Block
Img BW DTG Night Targets Display Data Estimates Confrimed Record? Validate?
COMP_0909_1238.mpg V: 288x352, 25.0 fps Image Tgt Count Target Tracker Img Mask Est Conf BW Target Statistics
Estimates Confirmed Night
Targets
Display Data Buffer Data
Time Data Count Stabilise Img Stabilise? Display Stabilised Mask Record Time Time Edit Parameters [BW] [Targets] [Confirmed] [Estimates] Night [Confirmed] [Estimates] [BW] [Targets] Night [Confirmed] [Estimates] 33.2854 DOC Text ConfMat TargetCount Data Logging
Log Data 1 Log Data 2
DSSeg SegmentationMethod 0 1 0 1 0 1 0 1 0 1 Assertion Figure 5.1: Sim ulink based trac king system to analyse fish mo v emen t in aquaculture sea cages. F rames p er second is sho wn at the top left of the mo del (33.2854 fps).
5.2.2
Adaptive (Local) Thresholding
Adaptive thresholding, also explained in Subsection 3.2.2, produces on average 40-70 samples per minute but at a lower frame rate than global thresholding (22 fps). The higher computational cost is due to the application of the median filter which has an execution speed dependent on the size of the neighbourhood. The quality of fish shapes extracted using adaptive thresholding was better during the day because the method dealt better with various illumination changes within an image frame. Night time recordings produced lower quality samples because the orientation of the light was sideways relative to the camera.
5.2.3
Edge Detection
When dealing with small fish, the thresholding methods often failed to detect any shapes and the overall outcome was a low detection rate. To improve the detection rates of small fish, edge based segmentation techniques were examined. The edge segmentation proved to be superior in terms of number of detections in a given period. While it did not always pick up whole shapes of fish, it was successful at segmenting the length of the fish even if the middle of the fish was not segmented completely. (Extraction of the length of the fish was the main objective of the segmentation process because this measurement was used in the calculation of speed in body lengths per second.) Two edge detection methods were investigated: Sobel and Canny. While producing similar outcomes, Sobel was computationally faster so it was the preferred method.
5.2.4
Median Background Estimation
Median background estimation method estimates the background by computing the median image using 50 frames. The current frame is then subtracted from the background and the absolute difference is thresholded. This method is effective when the fish density is lower, the size of the fish is smaller and fish tend to be further away from the camera. These conditions allow for the background to be successfully estimated and motion segmentation can yield a reasonable number of fish for the tracking system. The main problem with this method is movement of the camera which invalidates the current background quickly. This creates false measurements, which may be tracked if they persist for a significant period of time. The method can be reset to restart the background estimation process to help minimise the effects of camera movement on background estimation but this will not eliminate these effects completely. Also frequent resets require additional
computational resources (mainly due to median calculations) and during this time the frame rate slows down significantly.
5.2.5
Mean Background Estimation
Mean background estimation method estimates the background using the run- ning mean over all frames processed to that point by the system. This simple method was chosen as a comparison to the median method. It provides similar segmentation capability and suffers the same problems due to the camera move- ment. However it avoids the computation of median and it is therefore more computationally efficient.
5.2.6
Day/Night Detection
Special consideration was given to recordings which included both day and night footage (LL-photoperiod manipulation) because light sources and therefore back- grounds were different (natural light from above during the day, artificial light positioned sideways during the night). A simple indicator was developed to au- tomatically distinguish night from day:
Day/Night Indicator =
night time ifmk>0
day time if mk<= 0
(5.1) wheremk is a variable which increments or decrements based on another variable swhich describes the relationship between mean and variance calculated for each image frame. Both variables are calculated using the following equations:
mk=
mk−1+ 1 if s= 0
mk−1−1 if s= 1 (5.2)
s= (¯x < Mhigh AND σ2 < Vhigh)
OR ¯x < Mlow ORσ2 < Vlow
(5.3) where Mhigh and Mlow are high and low thresholds for mean (0.525 and 0.4 respectively). Vhigh and Vlow are high and low thresholds for variance (0.035 and
0.025 respectively). These values have been determined empirically for optimal differentiation between night and day.
The variablemkincrements or decrements over time but can never exceed the
This metric allows a switch between two segmentation methods: one for day- time recordings (Adaptive Thresholding) and one for night-time recordings (Me- dian Background Estimation). Calculation of the night time metric can be dis- abled if only day time footage is being analysed.
5.2.7
Object Extraction
Once the binary image is created, shape features are extracted using SIMULINK’s
Blob Analysis block. Prior to this step, thick borders are drawn around all edges of the image (with the Blob Analysis block set to ignore objects touching the borders of the image). This is to prevent detection when fish are appearing or disappearing from the field of view. During this time fish shapes are increasing or decreasing rapidly in size and this may affect the final tracking estimates. The
Blob Analysis block is set to extract up to 20 shapes and outputs the follow- ing data for each: centroid co-ordinates, bounding box size, major and minor axis length, orientation, eccentricity (defined as the ratio of the distance between the foci of the ellipse and its major axis length (Mathworks, 2009)), number of extracted shapes and a binary image of extracted shapes. From the track- ing perspective, the centroid co-ordinates are the most important because they determine the velocity of fish movement. However other variables play a role in the data association by influencing the value of the validation gate prior to association.