Conclusions - Generating Category-based Documents

5.2 Generating Category-based Documents

5.2.4 Conclusions

The problem of retrieving information via content is a non-trivial one. We present a relatively simple technique to generate documents for an image query with neither reference to the training documents (i.e. after learning the semantic projection we no longer make use of the training documents) nor the usage of keyword assignment to the image or its various segments. The presented results are promising towards the novel approach of learning the association between text and images to generate new documents. We also show that we are able to generate similar content documents that contain more words then the original associated text. Although the limited number of categories we believe the presented results to be a proof of concept.

68 Chapter 5 Imagery & Text Taxonomy 20 40 60 80 100 120 140 160 180 200 50 100 150 200 250

Figure 5.6: Paintball: Original text “benini reffing” generated document with 10

words, from highest to lowest rank “fate, darkside, kc, strange, team, wildcards, takeover, avljalde, hostile, check”.

20 40 60 80 100 120 140 20 40 60 80 100 120 140 160

Figure 5.7: Paintball: Original text “all americans” generated document with 10

words, from highest to lowest rank “ref, farside, american, team, trauma, stay, leader, flag, takeover, avljalde”.

It is important to state that we differ from the conventional image-word retrieval problem, which aims to associate a keyword that best describes the actual content of the image. We are interested in associating a document to the content genre of an image (as pre assigned by a website). This can then be used to generate documents (information) to new image queries.

5.3 Summary

The analysis of image and text is not new to the field of machine learning, although with developments in adjacent fields, new problems involving these are introduced. In this chapter we have introduced a computationally efficient approach for generic object categorisation, by reducing the possible combination of image features as we create a uniformed feature using the K-means approach. A field crucial for the evolution of

Chapter 5 Imagery & Text Taxonomy 69 10 20 30 40 50 60 70 20 40 60 80 100 120

Figure 5.8: Sports: Original text “ap photo more photos february 18, 2002

toronto (ap) – sam cassell ’s injured toe didn’t hurt his effectiveness against the toronto raptors . but he might be selective about future games he plays in so he’s ready for the playoffs. “i’d rather take care of it now,” said cassell, who missed two games with a sprained left big toe before scoring 20 points in the milwaukee bucks ’ 91-86 victory over the toronto raptors on sunday. “this is not a joke,” he said. “this is probably the worst i’ve felt as a professional basketball player. it’s painful every step you take. coach says, ’you can take the pain!’ but not this kind of pain.” cassell, who scored eight points in the fourth quarter, said he would need 20 days to heal. “we have a game (monday) . . .” generated document with 10 words, from highest to

lowest rank “boyz, attitude, pt, hot, urban, quest ,rip, team,ap,matrix”.

cognitive and autonomous vision systems. We have introduced a method, that while maintaining good results, is not computationally expensive as the leading approach. We continue to introduce a new approach for viewing online image-text data. Suggesting that rather then creating a keyword association, which would be beneficial for search engines, to create a content based association. This could allow the retrieval and/or generation of documents to image queries that are of the same content category. On the whole, it has been demonstrated how semantic model’s can be used with learning algorithms (or alone) to solve predominant problems in the field of image and text taxonomy.

Chapter 6

Music

“A performer who had not heard any of my music questioned me about several aspects of my work. On the topic of computer music, he had several technical questions and asked about the general process I employ in realizing a piece. I stated that I believed the

specific technologies employed are unimportant but out of familiarity I generally write algorithms in C to generate my pieces. The performer responded, ‘Well, I don’t know

what you’d use an algorithm for but I’m glad to hear it’s at least tonal.’ ”

- Jason Thomas

Music could probably be considered as a continuation of the human soul. From the dawn of mankind to the current modern day there has been a primal need for rhythm, a need largely unexplained. As with any form of art, music is an expression of style, emotion and individualism. In this chapter we apply machine learning in order to test whether music, and its elements, can be quantised, measured, analysed and even reproduced. Hopefully, bringing us a step closer to the understanding of the true nature of music. In this chapter, we address two main issues of learning music. We are able to realise that musical signals are dynamic and exhibit long term temporal dependencies. This raises fundamental issues in machine learning on how to develop methods for discovering and representing these dependencies for musical analysis and generation.

6.1 Identifying Famous Performers From Their Playing

Style

In this section, we focus on the problem of identifying famous pianists using only min- imal information obtained from audio recordings of their playing. We use a technique called the performance worm, which plots a real-time trajectory in 2D space. The worm is used to analyse changes in tempo and loudness at the beat level while extracting

72 Chapter 6 Music

Table 6.1: Movements of Mozart piano sonatas selected for analysis.

Sonata Movement Key Time sig. Sonata Movement Key Time sig. K.279 1st mvt. C major 4/4 K.281 1st mvt. Bb major 2/4 K.279 2nd mvt. C major 3/4 K.282 1st mvt. Eb major 4/4 K.279 3rd mvt. C major 2/4 K.282 2nd mvt. Eb major 3/4 K.280 1st mvt. F major 3/4 K.282 3rd mvt. Eb major 2/4 K.280 2nd mvt. F major 6/8 K.330 3rd mvt. C major 2/4 K.280 3rd mvt. F major 3/8 K.332 2nd mvt. F major 4/4

features for learning. Previous work on this data has compared a variety of machine learning techniques while using as features statistical quantities obtained from the performance worm. It is possible however to obtain a set of cluster prototypes from the worm trajectory which capture certain characteristics over a small time frame, say of two beats. These cluster prototypes form a ‘performance alphabet’ that capture some aspects of individual playing style. For example a performer may consistently produce loudness/tempo changes unique to themselves at specific points in a piece, e.g. at the loudest sections of a piece. Once a performance alphabet is obtained, the prototypes can each be assigned a symbol and the audio recordings can then be represented as strings constructed from this alphabet.

The task addressed here is to learn to recognise pianists solely from characteristics of their performance strings. The ability of kernel methods to operate over string-like structures using kernels such as the n-gram kernel and the string kernel will be evaluated on this task. In addition, to simply applying an SVM (as introduced in Section 2.4) to the data however, we will also examine the ability of dimension reduction methods such as Kernel PCA and Kernel Partial Least Squares (KPLS) (as introduced in Chapter 3) to extract relevant features from the data before applying an SVM, which will hopefully lead to improved classification performance.

In document Semantic Models for Machine Learning (Page 86-91)