Sparse Representations for Blind Source Separation

2.4 Adaptive Sparse Approximations

2.5.4 Sparse Representations for Blind Source Separation

number of sources from a smaller number of observations. In this work, the number of observations is typically larger than one. The solution

CHAPTER 2. SPARSE CODING ₄₇

to this problem is often based on the fact that the representation of the signal is sparse in some transform domain. A decomposition of the signal can then be found by constraining the individual sources to have a sparse distribution. The extensive work by Zibulevsky and co-workers [157, 156, 158, 84, 159, 155] is a good example of this approach. Further examples can be found in [75, 81].

A Bayesian view of the problem is taken by Rowe in his work [124, 125, 123] and by F´evotte in [32] while the work in [22] proposes a Gaussian mixture model as mentioned previously to solve this problem.

This work requires that more than one observation of the mixture is available. In chapter 10 we study the use of the model proposed in chapter 3 for the problem of single channel source separation. Models for single channel source separation were previously introduced in [142] and [61, 62]. These models, however, incorporate prior knowledge of the sources in the form of source models. Shift-invariant spectral methods for single channel source separation have been studied in [143] and [130]. However, these papers assume a linear combination of features in the spectral domain and do not address the problem of clustering features into individual sources.

Conclusions

In this chapter we have introduced a linear generative model to describe observations. In order to discover salient structures, restrictive conditions are forced upon the representation. Sparsity has been shown to be a very powerful assumption in this context, but positivity, where applicable, can also produce good results.

We formulated the sparse coding model in a probabilistic framework. This allows us to specify sparseness measures as prior distributions. Max- imum likelihood methods can then be used to adapt model parameters and in particular the set of features. This method leads to (at least local) optimal solutions. However, exact solutions are not possible and approximations have to be introduced.

From the section on applications, it is clear that time-series and images pose additional difficulties. Features can often occur at arbitrary locations.

The standard method of processing time-series in blocks leads to a model that has to learn features at all possible shifts. This not only increases the number of parameters to be adapted, it also requires the number of features to be large enough to cope with this repetition of features at different shifts. These effects and their influence on the extraction of features are studied in the next chapter, in which a shift-invariant sparse coding formulation is introduced. In this formulation, the model is explicitly con- strained so that features can be used at arbitrary locations to describe the signal.

Chapter 3 Shift-Invariant Sparse Coding1

In the standard sparse coding formulation introduced in the previous chapter, the observations x are vectors. However, many signals of interest in engineering, such as audio signals, are time-series. In order to deal with these time-series, it is customary to partition the sequence into smaller blocks. These blocks can then be used as the observations xin the sparse coding model. However, one motivation for the use of the sparse coding model is to represent the observations as a linear combination of salient features. In time-series such as audio, it is not generally known a priori at which time-locations features occur. The features present in a particular observation block are then randomly shifted with respect to the begin- ning of the block. In order to model this uncertainty, the standard sparse coding model has to include several copies of each feature at all possible time-locations.

This structure can be learned from the observations themselves, which requires that the model includes enough free parameters so that the features can be learned at different locations. It is, however, of advantage to keep the number of free parameters low, which can be done by explicitly enforcing the shift-invariant structure in the dictionary as suggested in [113, 79, 14, 102, 126, 147]. In this chapter we introduce this shift-invariant sparse coding model, which explicitly takes possible feature shifts into ac- count.

The first section defines and clarifies the notion of shift-invariance used in this thesis and distinguishes the concept of shift-invariance used here

Some of the material in this chapter has previously been published in [10]

from a concept that we call shift-consistency. With this terminology in place, section 3.2 introduces the shift-invariant model studied and used throughout this work. This model is based on the linear model of the previous chapter, however, additional structure is imposed on the matrix A. The inclusion of these structures then leads to a modification of the learning rule used to update the features.

The number of features learned and the number of features in the signal are important parameters in the learning process. In section 3.3.1 we analyse this relationship and discuss the advantages offered by the shift-invariant sparse coding model introduced here when compared to the standard sparse coding formulation. In digital signal processing we are dealing with discretised time-series. Often the original time-series is a mixture of features that can occur at continuously shifted time-locations. The effect that sampling of such signals has on the features learned is analysed in section 3.3.2.

In document Bayesian modelling of music: algorithmic advances and experimental studies of shift invariant sparse coding (Page 47-51)