EXPERIMENTS AND EVALUATION - A FRAMEWORK FOR HIGH LEVEL SEMANTIC ANNOTATION USING TRUSTED OBJEC

CHAPTER 04 A FRAMEWORK FOR HIGH LEVEL SEMANTIC ANNOTATION USING TRUSTED OBJECT

4.4 EXPERIMENTS AND EVALUATION

We used the [LabelMe] dataset for the experiments, which contains total of 181, 932 images with 56946 annotated images, 352475 annotated objects and total of 12126 classes. It was difficult for us to test the proposed system on all of the images, so we only select 500 images randomly.

In case of HSL, we achieve good results in FS and PS sets, the Figure 4.22 shows comparison of FS and PS set for the three randomly selected HLS example.

Figure 4.22: Example of the HSL annotation on Full Similar (FS) and Partial Similar (PS) sets

The Figure 4.22 shows the proportion between the FS and PS sets. The basic intension of categorizing the images into FS and PS sets for minimizing the human intervention and automatic the process of high level semantic description of the images. The basic idea for the categorization of the images into Full Similar and Partial Similar sets are on the basis of the novel concepts, i.e. Semantic Intensity (SI) of the different concepts within the single image.

157

It is a well-known fact that image is the combination of different objects and different combination of these objects constitutes different semantics meanings. Some of the concepts within the image are more dominant than the others. The proposed technique intents to categorize the images on the basis of matching the concepts tags with the images and their semantic intensity (see section 4.3.2). In the Figure 4.22, the number of the PS set have high value than that of FS, which is due to the facts, that it is very rare to agree that two images fully share the same semantics. For instance, the two images may contain the similar object combination but different semantic idea, like the images of the simple high level concept, i.e.

car park and the street may contain the objects like tree, road, people, car, building, sky, etc.

Even though both the concepts contain the same object constitution but the difference is the dominancy level of the objects. In the street view the object like the car is less dominant, while for the images contain the concept car park have the car object more dominant than other concepts like people, building, etc., which are more dominant in street view. The traditional system that based on the primitive feature extraction and object recognition and matching techniques flunks to differentiate among the images of both these concepts. We attempt to remove this bottleneck of the traditional system by exploiting the semantic intensity for differentiating the images of street with the car park. This is the reason why the full similarity between the images is rare. While partial semantics is possible due to the dynamics semantics of the images, i.e. in case of PS sets the image gets more than one HLS description representing their dynamics in semantics.

Information science has developed many different criteria and standards for the evaluation e.g. effectiveness, efficiency, usability, satisfaction, cost benefit, coverage, time lag, presentation and user effort, etc. Among all these evaluation technique precision which is related to the specificity and recall which are related to the exhaustively are the well accepted methods. As used by the previous researchers, the quality of the image annotation in terms of high level semantics can be measured through the precision and recall. Per-image precision and recall are calculated on the basis of a single test image taking from the corpus prepared for the high level semantic propagation. For each test image, precision is defined as the ratio of the number of semantic description that are correctly predicted to the total number of possible semantic description prediction tag with the image in the cluster set, and recall is the ratio of the number of semantic description that are correctly predicted to the number of semantic description in the cluster sets. Mathematically, they are calculated as follows

158

… (4.8)

… ( 4.9)

Figure 4.23: Precision and recall in term of HLS description for the FS set of 10 sample images.

For high level semantic annotation propagation, for the validation and verification of the effectiveness of the proposed framework, we applied queries on the corpus and check the results. The proposed techniques achieve a noticeable improvement in terms of precision and recall. The Figure 4.23 shows the precision and recall of the top 10 query results for the three randomly selected HLS annotation as a query. The three HLS annotation is (1) City view,

159

where people walking in the street. (2) Highway showing vehicles on the road. (3) Park, where people plays game, while some are doing exercise. The precision recall curve depicts a tremendous improvement in terms of specificity and exhaustively based on the FS set of the images. There is a variation among the three selected semantically enriched high level conceptual queries. This variation is due to the fact that, as with the increase in the complexity sometimes, the precision of the system decreases, and it is difficult to deal with. The high level semantic concepts like Park which itself a heteronym (words that have same spelling with different meaning). Park shares two concepts, i.e. car park and recreation park, dealing with such types of queries are very difficult. While in the Figure 4.23, the high level semantic concept Park also contains the concepts people and game, so it directs towards the

recreation park. However, still in most of the circumstances the precision of such types of

queries are less. The mean average precision for the queries based on the full similar set are, for the City view, where people walking in the street query is 0.64, for Highway showing

vehicles on the road mean average precision is 0.72, while for the query Park, where people play games, while some are doing exercise mean average is 0.53.

160

Figure 4.24: Precision and recall in term of HLS description for the PS set of 10 sample images.

The Figure 4.24 shows the precision recall curve @10 for the same three HLS annotation that was used for FS was also used for PS set. The curve for the PS is increased as compared to the FS set (Figure 4.23) due to the fact, the chances for the partial semantic sharing is high among the images as compared to the full similarity. The mean average precision for the City view, where people walking in the street query is 0.74, for Highway showing vehicles on the road mean average precision is 0.73, while for the query Park, where people play games, while some are doing exercise mean average is 0.68.

4.5 Chapter Summary

The focus on this chapter is on the process of manual annotation for the object annotated image datasets. Where we present a novel framework for the HLS support, this kind of work is a unique approach to date for the HLS annotation for a large scale images corpus. This framework can be easily turned into automatic by integrated an automatic object detector and recognizer. The flow of work of the framework is based on the cluster set of full

161

similar (FS) and partial similar (PS) are prepared for each of the images individually by using the image similarity mechanism, where a define threshold of 0.80 and 0.50 are declared for FS and PS sets. High Level Semantic description is then propagated by assigning them to one image and the system automatically spread it out that to all the images in FS and PS sets. This technique abbreviates the effort for the manual annotation and produces high semantic accuracy in terms of precision for large pool of image data sets. It stipulates a rich inside of the image in term semantics rather than the contents of the image. The experiments were investigated on the random selected portion of the LabelMe data sets. Improvements have been made in terms of semantic accuracy, effort and precision.

In document Semantic multimedia modelling & interpretation for annotation (Page 173-179)