3.2 Machine Learning in Art Practice
3.2.2 Components of a Machine Learning Algorithm
Parallel to the category of task they are designed for, Machine Learning algorithms can be qualified by the interoperability of four constituents: (1) the model; (2) the optimization procedure; (3) the
Models
Models in Machine Learning refer to the computational structure that gets modified through learn-
ing. The best way to think of a model is as a function that tries to approximate as close as possible a distribution of data, based on a sample of that distribution (the dataset). The model contains free parameters that are to be adjusted by the training algorithm. For example, in the Multi-Layer Perceptron, the parameters are the “weights” or “synapses” that connect the neurons with one another. Other models include decision trees, Bayesian networks, Support Vector Machines and nearest neighbors models. In a GA, the model is the function that associates DNA strings with a phenotype, while the chromosomes are the free parameters to adjust.
Models are the object of important debates in the field of Machine Learning, being the defining flagships of different research strands. However, when it comes to artistic works, they are possibly the least explored dimension. As was expressed earlier, most adaptive artworks involve a very re- stricted set of models, which happen to be among the most easily understandable and applicable ones (GAs and SOMs). This most likely has to do with the fact that scientists and artists have different goals and expectations. To put it simply, an apparently small improvement in the perfor- mance of a model can be seen as revolutionary from a scientist’s perspective but will not change much in terms of how it affects the experience of an artwork.
Nonetheless, there are at least three ways in which models can affect artistic outcomes. First, the nature of the model is often an important part of the concept of a piece: the imaginary space opened-up through the use of neural nets differs conceptually from that of evolutionary computation or decision trees. For example, Sims’ Galápagos plays with the richly evocative nature of evolution, allowing the user to take part in a story of genetic adaptation as the godlike subject that runs the natural selection process. Ben Bogart’s installation Dreaming Machine #2 (2009) and Ralf Baecker’s Mirage (2014) both involve neural networks in pieces about memory and dreaming — two themes that lie at the center of research on neurology that directly inspired computer-based connectionism.
presents an artistic strategy whereby a Genetic Algorithm model is used in a way utterly different than what it was designed for. Google’s DeepDream is a good example of a creative approach that employs specific properties of a neural net to transform it into a generative device it was never meant to be. These artistic strategies usually take advantage of an accidental feature of a model, diverting it out of its habitual or intended use. It requires a good comprehension of the model and/or an experimental approach.
The third process by which models can impact artistic works is more subtle and has not been the object of serious analysis. It has to do with the fact that different models will yield, or afford, different kinds of behaviors. The variety and types of behavioral strategies that the model allows, and the “smoothness” — or “abruptness” — in the evolution of these strategies during learning, are examples of how models can affect agent aesthetics.37
Optimization Procedure
The optimization procedure — also called search or training algorithm depending on the context — changes the parameters of the model in an attempt to improve its responses over time. Different kinds of such procedures exist, each with their own advantages and domain of application. For example, there is a vast amount of research on training algorithms for neural networks, using different optimization approaches such as Stochastic Gradient Descent, Genetic Algorithm, and simulated annealing.
Most optimization algorithms exploit the Cybernetics notion of negative feedback: in response to the perceived error yielded by its actions, the organism adjusts its inner structure in a timewise manner, step by step, moving towards an optimum. Whereas many cyberneticians were interested in the process itself, for scientists working in the field of Machine Learning, optimization is a means to an end. The principal goal is to train a system that will perform well on a particular task
once it has been optimized. What happens before that, the behavior of the system as it gets there,
is considered irrelevant. Conversely, it is what is probably the most relevant to an aesthetics of adaptive behavior.
37
The advantages and disadvantages of neural nets, as opposed to GAs and other techniques such as fuzzy logic and Support Vector Machines, is a broadly debated topic in the field of Machine Learning.
The learning process can typically be fine-tuned using a set of meta-parameters. For example, most optimization methods involve the use of a learning rate parameter which represents the speed at which the system moves towards a local minimum. There is, however, a trade-off: high learning rates will get models to converge faster, however, they will often yield poorer results; lower values will take more time but result in a finer model. A common way to solve this trade-off is to start with a larger learning rate and slowly decrease it over time.
Another example is the exploration vs exploitation dilemma in Reinforcement Learning (Sutton and Barto 1998, 4). When an agent moves in a space searching for the best strategy to maximize its reward over time, it needs to be able to both exploit its current knowledge (by making decisions it thinks are going to yield good rewards) and explore new avenues (so as to avoid getting stuck in a region of the space that yields poor rewards). Exploration is usually more chaotic and random, while exploitation is targeted and greedy. In a typical RL setup, agents will start by exploring and, over time, be tuned to favor exploitation as they become more efficient in accumulating rewards.
The agent’s tendency to favor exploration over exploitation is usually represented by a single parameter. For example, in one of the most commonly used learning policies, called ǫ-greedy, a parameter ǫ between 0 and 1 represents the probability that, at any given step, the agent will choose a completely random action (if not, then it will choose the action it believes will yield the highest return, hence the name “greedy”) (28). Altering ǫ can be used as an aesthetic trick in agent-based systems, allowing the shaping of behaviors in real-time, moving them between chaos and order. This strategy was applied in the immersive installation/performance piece N-Polytope (2012), for the construction of live generative behavioral patterns, as described in section 6.3.4.
Evaluation Function
The evaluation function measures the performance of the model in completing its task. In Super- vised and Unsupervised learning, it is usually referred to as the loss function or cost function. In a classification task, for example, the category predicted by the model given an example to classify is compared to the expected target category: the more the model misses the target, the larger the loss. In Reinforcement Learning, the evaluation function is called the reward function, while in
Genetic Algorithms, it corresponds to the fitness function.
Among the three dimensions of a Machine Learning algorithm, the evaluation function is prob- ably the one that is the most readily useable by authors. This is because it has been designed specifically for the purpose of bringing human input into the equation. Models and optimization procedures are meant to be rather agnostic: the evaluation function determines the kind of “prob- lem” one tries to solve. However, the approach in art completely differs from that of science. While scientists use evaluation functions as objective criteria for the learning algorithm to solve, artists typically use the evaluation function as a tool for generating self-organizing behaviors, subject to their own authorial control. In other words, for scientists, the evaluation function represents the goal they aim to achieve, without any care for the way it is reached (i.e., the goal is more important than the process to reach it), whereas for artists the relationship between the evaluation function and the goal (which is to generate interesting behaviors) is indirect (i.e., the process is the goal).
Artists can thus play with evaluation functions and observe how the agent responds. An eval- uation function can also be learned or attributed by another agent. Finally, evaluation functions can be interactive, with either the artist or the audience replacing the function by directly giving an evaluation of the system’s performance. In the case of evolutionary computation, this tech- nique is known as an Interactive Genetic Algorithm (IGA), an approach first proposed by Richard Dawkins (Dawkins 1986).
Karl Sims’ Galápagos (1997), which was presented earlier in section 3.2.1, is one of the most renowned examples of the use of IGA in an interactive installation. Here, visitors are asked to select their favorite artificial 3D creatures, whose genetic code is used to create the next generation through mutations and crossovers. Core to the work’s aesthetics is its participatory nature, engaging audiences in the production of novel forms through a playful and intriguing experience.
The Fifth Absence (2011) is another example of how an evaluation function can be used poetically in the generation of an artificial behavior. As described earlier, the work involves a robotic agent immersed in a behavioral conundrum through the implementation of a reward function precisely designed to generate it. The agent in this artwork is forced to discover, through trial and error, a strategy that will allow it to match its desire to avoid looking at light sources with its need to
get solar energy. The slow-paced behavior of the agent, who moves about once every 2–3 minutes, places it in a different category than Galápagos in terms of aesthetics. Like most other interventions in Absences, this is a very conceptual piece, as the shape of its behavior can not be perceived in real-time by human subjects and thus needs to be imagined by the audience.
Data
Data is an often overlooked, yet crucial dimension to consider when thinking about adaptive behav- iors, especially in an artistic context. There are practical concerns when dealing with data encoding, as well as challenging issues that arise when dealing with high dimensional spaces, such as is the case with image or speech recognition, which are largely beyond the scope of this dissertation.
The first thing to consider in regards to data is the kinds of inputs and outputs that will be fed into the system — in other words — what the agent will be able to observe, and how it will be able to respond to these observations. In order to be effective, these inputs and outputs need to be carefully chosen to afford the kind of experience the artist has in mind. Moreover, there needs to be a way for the agent to make inferences, otherwise no learning will happen. For example, a system that can only detect light cannot be asked to learn about the sounds made by visitors.
The set of sensors/observations/inputs and actuators/actions/outputs, and the way they are embodied in the adaptive physical devices that are staged in an agent-based artwork, possibly constitute the most important decision an artist has to make in the creative process, as it will define the kind of space in which the agent can evolve, the sort of behaviors it can afford.
Secondly, it is self-evident that the data distribution from which the examples are selected has an important influence on the reactions and establishment of the system’s behavior. One of the most dreaded issues in Machine Learning is overfitting, a problem that arises when a system estimates “too perfectly” a specific dataset, thus becoming less efficient at making predictions on unseen samples (i.e., taken outside of the training dataset). While overfitting is a plague for data scientists, it might actually be exploited creatively by artists, by hand-picking data (such as by creating a constrained environment) in order to encourage a specific response in the system.
Other Considerations
Both from a scientific as well as an engineering perspective, Machine Learning techniques are simple in spirit, yet extremely complex when it comes to details. Many elements can influence the success or failure of a particular algorithm on a particular problem, and much energy is spent in the field to compare strategies and try to extract general principles behind learning.
The biggest challenges are related to issues that arise when dealing with high dimensional data, which becomes the case when dealing with image or speech recognition. These difficulties mainly concern questions of generalization, that is, the problem of training a model on a specific set of examples so that it becomes good at making predictions when faced with examples taken outside of that dataset. An important conceptual issue is known as the curse of dimensionality. It spans many unique problems that arise when dealing with high-dimensional data. One of the most fundamental consequences of the “curse” is that the number of free parameters (which amount to the representational power of the model) need to be tuned according to both the dimensions of the input space and the size of the training database.38
(a) Linear model (e.g., percep- tron).
(b) Nonlinear model (e.g., MLP).
(c) Nonlinear model overfitting the data.
Figure 20: Example comparisons of how different kinds of model classify data points in a two dimensional space, including a case of overfitting.
It is largely beyond the scope of this dissertation to give a detailed account of these theoretical concepts. However, artists should be aware that these techniques require at least some basic knowl- edge if one wants to be able to manipulate them as creative tools and material. Unfortunately, there
38
exist almost no resources at the moment specifically dedicated to teach artists about ML,39 and
most of the tutorials require at least some degree of knowledge in mathematics and programming.