Visual style - COMMUNICATIVE MULTIMEDIA - Modeling Visual Rhetoric and Semantics in Multimedia

1.4 COMMUNICATIVE MULTIMEDIA

2.1.3 Visual style

In this section, we introduce the concept of visual style modeling. It has long been noted in the media studies community that the concept of style suffers from an overabundance of interpretations. One working definition, of particular relevance to the machine learning community, is given by David Bordwell, a distinguished film theorist, who defines style as the “patterned use of a medium’s techniques” [38]. We find this definition pleasing because it is not restrictive of the type of medium and also suggests the recurrence of patterns which a machine may be able to model. We note that other vision works have modeled other visual styles beyond those discussed in this dissertation, including fashion style [5, 173], urban style [78], and product styles in e-commerce images [116]. We consider two visual styles in this work: photographic style, in which a professional photographer’s particular artistic traits are reflected in his or her work, and persuasive or communicative style, which describes how a visual artist uses visual rhetoric within the image to persuade or communicate with viewers.

Photographic style modeling is also of relevance to our work on modeling bias. For example, photographic styles may belie the authorial intent of the photographer, which is of interest when assessing bias in photographs. We contrast our work on modeling visual style with work in modeling artistic style, which is commonly defined as the colors, textures, brush strokes, or dominating geometric patterns comprising artistic works, such as paintings or drawings [98, 97].

Modeling artistic and photographic style. The task of automatically determining the author of a particular work of art has always been of interest to art historians whose job it is to identify and authenticate newly discovered works of art. The problem has been studied by vision researchers, who attempted to identify Vincent van Gogh forgeries, and to identify distinguishing features of painters [287, 88, 163, 59]. While the early application of art analysis was for detecting forgeries, more recent research has studied how to categorize paintings by school (e.g., “Impressionism” vs “Secession”) [305, 171, 163, 313, 15, 20, 29]. [29] experimented with a simple dataset of 7 painters with very different styles and achieved good results with low-level features due to the dataset’s simplicity. [305] explored a variety of features and metric learning approaches for computing the similarity between paintings and styles. Features based on visual appearance and image transformations have found some success in distinguishing more conspicuous painter and style differences in [29, 313, 171], all of which explored low level-image features on simple datasets. Recent research has suggested that when coupled with object detection features, the inclusion of low-level features can yield state-of-the-art performance [20]. [15] used the Classeme [344] descriptor as their semantic feature representation. While it is not obvious that the object detections captured by Classemes would distinguish painting styles, Classemes outperformed all of the low-level features. This indicates that the objects appearing in a painting are also a useful predictor of style.

This dissertation considers photographic authorship identification, but the change of domain from painting to photography poses novel challenges that demand a different solu- tion than that which was applied for painter identification. The distinguishing features of painter styles (paint type, smooth or hard brush, etc.) are inapplicable to the photography domain. Because the photographer lacks the imaginative canvas of the painter, variations

in photographic style are much more subtle. Complicating matters further, many of the photographers in the dataset we collect are from roughly the same time period, some even working for the same government agencies with the same stated job purpose. Thus, photographs taken by the subjects tend to be very similar in appearance and content, making distinguishing them particularly challenging, even for humans.

There has been related work in computer vision that studies aesthetics in photography [234, 251, 68]. Some work also studies style in architecture [72, 197], vehicles [198], or year- book photographs [102]. However, all of these differ from our goal of identifying authorship in photography. Most related to our work on predicting photographic authorship is the study of visual style in photographs, conducted by [168]. Karayev et al. [168] conducted a broad study on both paintings and photographs. The 20 style classes and 25 art genres considered in their study are coarse (HDR, Noir, Minimal, Long Exposure, etc.) and much easier to distinguish than the photographs in our dataset, many of which are of the same types of content and have very similar visual appearance.

While [168] studied style in the context of photographs and paintings, we explore the novel problem of photographer identification. We find it unusual that this problem remained unexplored for so long, given that photographs are more abundant than paintings, and there has been work in computer vision to analyze paintings. Given the lower level of authorial control that the photographer possesses compared to the painter, we believe that the photographer classification task is more challenging, in that it often requires attention to subtler cues than brush stroke, for example. Besides our experimental analysis of this new problem, we also contribute the first large dataset of well-known photographers and their work. We also propose a method for generating a new photograph in the style of an author. Similarly, in our work on modeling visual persuasion and political bias we generate new photographs containing persuasive or communicative styles which we model in our work. We note that this problem is distinct from artistic style transfer (discussed below) [17, 39, 16] which adjusts the tone or color of a photograph.

Transferring artistic style. In this dissertation, we focus on modeling meaningful semantic concepts, such as photographic or persuasive style. We then use generative to generate synthetic data in order visualize what our models have learned. In our work on visual

persuasion and political bias, we show how an existing image can be modified to bear the persuasive (or biased) styles we model. Our work is thus a type of semantic, rather than artistic, style transfer. Artistic style transfer methods attempt to render the content of one image in the artistic style of another. For example, we might modify a portrait to have the same artistic style as “Starry Night” by Van Gogh. Importantly, these methods do not seek to change the semantics of the image, but instead focus on changing low-level details, such as textures or colors. Early methods primarily rely on low-level (and often handcrafted) patch-based texture features [80, 131, 187, 318]. More recently, impressive results have been achieved using features extracted from pre-trained convolutional neural networks (CNNs) [97, 160, 83, 227, 367]. Gatys et al. [97] showed how style transfer can be formulated as iter- ative optimization that seeks an image which produces the same CNN activation statistics of both the “content” and “style” images. Follow-up works [160, 346] improve efficiency by performing style transfer in a single feed-forward pass, but these can only transfer towards those styles present during training [47]. Recent methods [47, 144, 204] combine the speed of feed-forward networks with the flexibility of optimization-based approaches, enabling fast style transfer on arbitrary styles. Unlike our work, all of these approaches focus on transferring low-level textures, while keeping the semantics of the produced image the same. In our setting, we seek on transferring high-level semantics, which change the meaning of the image itself.

In document Modeling Visual Rhetoric and Semantics in Multimedia (Page 48-51)