• No results found

Data of Video Scenes Pre-processing

4.4 Experiments and Results

4.4.1 Data of Video Scenes Pre-processing

We performed various experiments using the selected data from over 30 football videos from the English, Spanish, German and French soccer leagues as our database. All videos had a resolution of 1280*720 pixels with frame rates of up to 25 frames/second. First, we separated the soccer match videos to two groups, 85% of the videos is used to be our “training and testing”, the other 15% is used to be the “out-of- range” dataset. The “training and testing” dataset and “out-of-range” dataset have no common video sources. And the soccer videos in “out-of-range” dataset has the significant difference from “training and testing” dataset. For example, in our “out-of- range” dataset, there are some videos from the Spanish afternoon soccer matches which

have the strong sunshine on the ground and players, and the appearances of the objects in the videos have much more brightness colour than normal soccer videos. These videos are not in our training and testing dataset.

The purpose of building the “out-of-range” dataset is to evaluate the flexibility of the systems if the system is able to give the correctly prediction for the inputs which have the differences from the training and testing data. However, it is not the transferring learning process, the system would not obtain any update in the evaluation process of the “out-of-range” dataset. The predict results of the system in “out-of-range” dataset give us the accuracy which compare to the “testing” dataset, in order to describe the flexibility of the system when process different data.

The total video data comprises over 800,000 frames, “training and testing” data set takes 85% of them where were randomly divided into the training data (70%), used to build the system’s needed parameters and the rule base, and testing data (15%), used to evaluate the classification accuracy of the system. Approximately 15% of matches to be the “out-of-range” dataset in order to test the flexibility and capability of classification system when running with the video data from difference video sources. In order to train, test and evaluate our systems, we have to pre-process the sources of “videos” to be the readable data of numerical values. Thus, we developed a GUI program named “VideoInfo” to load the videos and chop videos to small video clips as the first step. Secondly, we labelled the small video clips with the scene classes (“Centre Field”, “Players Close-up” and “Audiences & other people”). At last, we extracted features of the video clips to capture histograms and compute distance by using OPENCV library.

The process of the experiments for the fuzzy logic scenes classification in soccer video generally can be regarded as a two-stage operation, video data pre-processing and type-1 FLCS training (section 4.2) and testing.

The “VideoInfo” GUI program is coded for chopping the needed data from the download video sources. This program imported the original soccer video and make them into short time video slips (normally 3-15 seconds) which representing groups of continuous video frames. The continuous video clips generally belong to the same scene. However, the double check is needed during the label marking in the video clips. The mixed scenes video will be removed from the catalogue of classes, and the data will be separated into the training dataset, testing dataset, and “out-of-range” dataset, which are fed to train the type-1 FLCS. The training and testing video clips come from the same video resources, and the “out-of-range” dataset is from the different video resources. In the experiment, the “out-of-range” dataset is used to test the flexibility of FLCS, where trained by a different data source.

4.4.1.1 The process of building the dataset

The “VideoInfo” GUI program is a user interface similar to a video player, with the functions of video play, pause, and resume buttons, other basic functions and a window to show the current frame histogram with RGB colour distribution information. The video is loaded after the user selects the directory and the time information is stored in a text file to record the time information of each video clip. The screenshot of “VideoInfo” GUI program interface is shown in Figure 4.9.

Figure 4.9:The “VideoInfo” GUI program

“VideoInfo” GUI program system only complete the segmentation function to chop the original video source to short time video clips. The videos are separated by the “VideoInfo” due to the rapidly changes between two continues frames. The video

can be regarded as continuous sequences of images; thus two histograms of the image scenes will have significantly changes especially in the frame of current scene changes rapidly to another scene in the video. However, this segmentation could not build good enough dataset for our system required, the video clips generally are mixed with different scenes. We have to remove the scenes-mixed video clips manually and labelling the qualified video clips with the scene class.

4.4.1.2 Video Scenes Classification System

Figure 4.10 presents some screenshots of our system operation for scene classification, showing the prediction class label in “Centre” scene detection (a) and “People” scene detection (b). This program is the testing phase of type-1 FLCS prediction and also was used for type-2 FLCS and back propagation (BP) neural network to output the current video scene and predict its class. The prediction is shown as the text in the bottom of the program, and the current scene is in the top-left, with its histogram in the top-right.

(a) “Centre Field” scene detection

(b) “People” scene detection