About Dynamic Processing in Mainstream Music

(1)

About Dynamic Processing in Mainstream Music

EMMANUEL DERUTY1_{AND DAMIEN TARDIEU}2

1_{The present article accounts for independent research that was conducted while working as a contractor for INRIA /}

METISS, Rennes, France

2_{STMS Lab IRCAM-CNRS-UMPC, Paris, France}

In this article, we study the evolution of music level and music level variation between 1967 and 2011. To do so, we suggest a set of signal features, and examine the impact of limiters and compressors on a corpus of music tracks. As a result, we find that some of the dynamic processes used during music production may be retrieved from the signal. Then, we examine the evolution of studio practices in relation to dynamic processing during the past five decades, as well as the evolution of the relevant signal features. In particular, this results in a characterization of the loudness war. We find for instance that it may have peaked in 2004, resulted in reduction of peak salience, but did not result in any reduction of long-term musical dynamics.

0 INTRODUCTION

Music level and music level variation have been much debated in recent years. In particular, it has been argued that new production techniques, such as dynamic compres-sion and limiting, have favored high loudness tracks with small loudness variations [1],[2]. This trend is referred to as theloudness war, of which [3] (pp. 237–292) and [4] make detailed accounts. This discussion, which has impli-cations in terms of perceived music quality, sound engi-neering practices, and music history, suffers from a lack of precise scientific studies and standardized vocabulary de-spite some recent interesting studies [4]–[7] and tentative normalization from the EBU [8]–[11].

The problem of level variation in music is often discussed using the notion ofdynamic range[7],[12] (p. 128) [13]– [15], which appears to be poorly defined [11]. It some-times refers to micro-dynamics, and somesome-times to macro-dynamics. It has no clear perceptual counterpart and no standardized way of measurement. This lack of definition leads to partially wrong or even contradictory conclusions in several publications, in which the authors confuse crest factor with loudness variability. A case in point can be found in an IEEE Spectrum article entitled “The Future of Music” [2], where the author first describes the crest factor of a song as the difference between its highest points and the average level, calls it “dynamic range,” then proceeds to explain how reducing this “dynamic range” leads to a “constant level of the sound,” where “[it] becomes analo-gous to someone constantly shouting everything he or she says,” by which he now addresses loudness variability. The

same argument is then repeated in famous nonacademic publications such as the Wall Street Journal [16].

These articles are illustrated using the comparison of two waveforms, similar to the ones shown in Fig. 1. The authors consider that waveforms akin to the bottom one, in which peak level appears to be constantly near 0 dBFS, correspond to music in which there are no soft or loud parts. However, such a deduction amounts to considering that constant peak level leads to constant loudness, and therefore that peak level and loudness are highly correlated, which is not the case. Having first confused crest factor with loudness variability, then peak level with loudness, the authors conclude that recent, loud records are devoid of musical dynamics. This is by no means an isolated point of view [1],[17], and only a few publications reason otherwise. In particular, two recent corpus-based studies claim that dynamic variability of modern music has been conserved [5],[18].

This debate should be clarified, and, to begin with, defi-nitions should be provided for the concepts at hand. When it is possible, precise sound descriptors should also be pro-vided, and reference measurements should be performed. Then, and only then, it should be wise to compare music from before and after the loudness war, and try to identify evolutions and trends. Indeed, while music loudness has been the topic of many researches in signal processing and psychophysics, leading to precise definitions and models [19]–[24], music loudness variation has received consider-ably less attention, with, until recently [7],[11], very few dedicated articles. Research suggests that loudness varia-tion can be envisioned using microscopic or macroscopic

(2)

Fig. 1. According to several authors, the top waveform corre-sponds to music with dynamics, whereas the bottom one does not. time-scales, resulting in the respective notions of micro-dynamics and macro-micro-dynamics [12] (p. 129), also referred to as density andconsistency [6],[11]. In regards to sig-nal description, micro-dynamics are sometimes associated to the salience of the signal’s peaks [13],[14],[25], which means that if we accept this particular association, then micro-dynamics could be measured using the crest factor or possible derivatives. The problem of macro-dynamics measurement appears to be better defined, having been ad-dressed by the spread of the loudness along the track [5], of which the EBU 3342 loudness range is a particular case [11],[26].

This article discusses music level and music level varia-tion in a systematic manner. Secvaria-tion 1 describes the set of tracks (the corpus) used in all subsequent experiments. In section 2, we examine several signal descriptors measuring music level, macro-dynamics and micro-dynamics. Then, in section 3, we study the influence of dynamic processors on controlled musical signals. This experiment provides ref-erence values for the proposed descriptors that can be used for music analysis. It will be shown that in many cases, it is possible to quantify the kind and the amount of dynamic processing applied to a particular track, which amounts to performing a kind of reverse sound engineering. Section 4, in which we measure the evolution of the different descrip-tors from tracks released between 1967 and 2011, contains perhaps the most significant part of the paper. Using the reference values from the previous section, along with their relations to particular dynamic processing practices, this analysis will lead us to a precise characterization of the loudness war. Most notably, it will be shown that whereas the loudness war may have a strong influence on micro-dynamics, it has no influence on macro-micro-dynamics, which means that recent music tracks still feature both loud and quiet segments. We will also show evidence of a possible end of the loudness war around 2004.

It is important to point out that this article does not deal with the practice of remastering, which has been criticized for altering the original content of classic tracks [18], and would require a dedicated study.

1. CORPUS

The music corpus we use in this article is made of 4500 tracks released between 1967 and 2011, each year corre-sponding to 100 tracks. These tracks are taken from albums that got considerable “commercial” and/or “critical” suc-cess according to two different online sources [27],[28], and chart archives from the US Billboard [29]. While this method of selection does not lead to a random sample, it ensures that the corpus is based on music that is broadly listened to. Each album from the corpus was verified as featuring a mastering that could realistically be performed at the time of the initial release – excluding as a remastered edition, for instance, obvious digital brickwall limiting on a 1970 album. We choose to start the corpus at the end of the sixties, for the reason that we consider these years as the advent of the contemporary pop/rock era, characterized with the creative use of the recording studio [3] (p. 157) along with mass media availability [30].

Some experiments we perform in this article require the use of a specific sub-corpus, made from tracks that were not processed with a brickwall look-ahead limiter. From our main corpus, we randomly choose 10 songs from each year between 1969 and 1989. As no brickwall limiters were used in the studio before the beginning of the 1990s [3](pp. 279– 280), and as the tracks were manually checked as not being remastered, this ensures that no brickwall limiting was ap-plied on any of the 210 resulting tracks.

2 SIGNAL DESCRIPTORS 2.1 Level and loudness

We evaluate the physical level of the tracks by using RMS level as a global descriptor. When it comes to loudness, sev-eral algorithms have been developed. Some are based on psychoacoustic models [20]–[24], others are energy-based measures [9]. Throughout this paper, for evaluation of short-term loudness, we use the energy-based K-weighting algo-rithm proposed by the ITU [9]. For evaluation of loudness made on entire tracks, we use the K-weighting based “in-tegrated loudness” measure suggested in EBU Tech 3341 and EBU R 128 [8],[10]. Such choices are motivated by the fact that K-weighting has been shown to be similarly robust for music compared to more complex measures that may include detailed perceptual models [31], while being implemented in several commercial products, such as the Waves Loudness Meter1_{, the Flux Pure Analyzer}2_{, and the} Trinnov Loudness Meter3_.

2.2 EBU3342 Loudness Range

The EBU has defined a global measure of loudness vari-ability called LRA (for Loudness RAnge) [11],[26]. The signal is K-weighted according to the ITU-1770 loudness model, gated using absolute then relative gating, and split

1_{www.waves.com/content.aspx?id}₌₁₁₈₈₄

2_{www.fluxhome.com/products/analyzer modules/pas}

spectrum/

(3)

into 3s long windows. The RMS of each window is then evaluated. The LRA is defined as the difference between the estimates of the 10th and the 95th percentiles of the RMS distribution.

By evaluating the difference between two percentiles of the RMS based on 3s long windows and a hop size of 1s, the EBU3342 LRA will measure the loudness variations whose time-scale lies above 2s. This makes the EBU3342 LRA a descriptor of the variation of loudness on a macroscopic time-scale. It should then be used to measure the long term variability of loudness, high values showing that there are some very quiet parts and some very loud parts in a track [11],[26]. It is not in the scope of this particular article to link signal-related notions to percepts (an approach that’s for instance considered in [7]). Nevertheless, the EBU3342 LRA appears to be a suitable feature for the evaluation of musical dynamics in the classical sense, such aspianissimo tofortissimo.

2.3 High Level Sample Density

We introduce a global descriptor calledHigh Level Sam-ple Density(HLSD). It measures the proportion of samples above−1 dB Full Scale (dBFS) after normalization. Let s(n) be the audio signal with s(n)∈[−1, 1]. The signal is normalized and expressed in dBFS:

sdB(n)=20∗log10 _| s(n)| max(|s(n)|) (1) And HLSD is defined as:

HLSD(s)=log₁₀(Card ({n|sdB(n)>−1})) (2) In equation (2), Card(. . .) provides the number of ele-ments n for which sdB(n) > −1, that is, the number of samples whose level is greater than−1 dBFS. In section 3, HLSD will be shown to be a good measure of the amount of brickwall limiting applied to the audio content.

2.4 Crest Factor

The crest factor is the ratio of the instantaneous peak value of the signal level to its time-averaged value [15],[32]. We investigate two different methods for its evaluation.

2.4.1 Method 1

In the studio, a practical method to estimate the crest factor can be based on the difference between the readings of a Peak Program Meter and a VU meter [33],[34]. To emulate such a process, method 1 will consist in the evalu-ation of the level RMS on a series of 300 ms windows. The choice of this particular value is based on the observation according to which 300 ms is the typical integration time of a VU-Meter [35]. Evaluation of the peak level will feature a 10 ms integration time, with a subsequent decay of 24 dB in 2.8s. Such values respectively correspond to the attack and return times of a Peak Program Meter [36].

2.4.2 Method 2

We investigate the possibility that consists in estimating the crest factor by evaluating both peak and RMS levels on a

series of overlapping windows. Letsw(n) be the windowed audio signal with sw(n) ∈ [−1,1]. Let N be the window length in samples. The short-term crest factor can be written as: sdB(n)=20∗log10 |s(n)| max (|s(n)|) (3) The crest factor as a global descriptor will then be CFglobal = median(CFw). Such a method is simpler than method 1, and proves to be much less CPU-intensive. Eval-uation performed over our corpus (section 1) shows that the maximum correlation between both methods is reached when method 2 relies on 0.47s windows. In that case, cor-relation isρ(4498)=0.96,p<10−3. Therefore, during this paper, we will be using method 2 with 0.47s windows.

2.4.3 Usage of the Crest Factor

In some publications, the crest factor is used to describe macro-dynamics, that is, the presence of loud and quiet parts in the signal [2],[13],[16]. Given the very defini-tion of the crest factor, this is obviously not adequate. The confusion may be due to the use of the expressiondynamic rangefor both macro- and micro-dynamics in the literature. Indeed, the crest factor can be a good measure of micro-dynamics of a signal, measuring the saliency of the peaks. But we argue that it is not always the case. For instance, lets consider the following signal:

∞

n₌1

cos (2∗pi∗n∗F0) (4)

This signal, despite a complete absence of any kind of amplitude variation, has a crest factor of 15 dB, a value that should correspond to very high micro-dynamics. Indeed, from our 4500-track corpus, only 18 tracks possess a crest factor that’s equal or above 15 dB. Also, the crest factor is sometimes used to measure the amount of dynamic pro-cessing applied to a track [13]. Again, we think that this use is only partially correct. Indeed, consider a sustained clarinet sample. It features a low crest factor (between 5 and 6 dB, that is, 2 dB lower than most tracks from Metallica’s highly compressed “Death Magnetic” [16]) despite the fact that no dynamic compression is applied. Given these con-siderations, we propose a new measure of micro-dynamics, directly aimed at measuring the amount of dynamic pro-cessing applied.

2.5 Peak to RMS Regression Coefficient

We measure the RMS and peak levels corresponding to the waveforms shown in Fig. 1. These levels are shown in Fig. 2. In “Seaside Rendez-vous,” peak and RMS evolve conjointly, whereas in “Californication,” they do not. This observation stands as the basis for thePeak to RMS Regres-sion Coefficient (PRRC), a signal descriptor we introduce to describe micro-dynamics. Intuitively, a PRRC close to 1 (Fig. 2 left) will correspond to a signal where RMS and peak levels evolve together, whereas a PRRC close to 0 (Fig. 2 right) will correspond to a signal where the peak levels are stable regardless of the behavior of the RMS

(4)

50 100 150 200 250 0 Time (frames) Level (dB FS) Time (frames) Level (dB FS) Peak level RMS level 100 200 400 500 600 0 Peak level RMS level

Fig. 2. Two different ways of controlling musical dynamics. On the left, a track with few dynamic processing, leading to a Peak to RMS Regression Coefficient close to 1 (0.79). On the right, a highly processed track leading to a Peak to RMS Regression Coefficient much closer to 0 (0.26).

level. The PRRC is intended to describe the amount of dy-namic processing applied to a source. It is closely related to a previous notion we have been referring to as “paradigm of musical dynamics” [18].

Short-term PRRC as a signal descriptor is evaluated by computing the regression coefficient between the short-term RMS and the short-short-term peak level of a signal, both expressed in dB. The descriptor is calibrated so that me-dian (PRRC) ≈ 1 in the case of vocals recorded in an anechoic chamber, and median (PPRC) gets close to 0 in the case of highly limited audio content. We observe that dynamic compression and limiting will decrease PRRC values, while dynamic expansion and gating will increase them. This suggests that the PRRC is a better measure of the amount of dynamic processing that was applied to the source than the crest factor, with PRRC=1 corresponding to the highest degree of a perceptual aspect of the signal we want to call “dynamic naturalness.”

While it is true that for such a descriptor, correlation between RMS and peak levels might also be considered, we find that in practice, linear regression leads to more reliable results. Finally, to evaluate PRRC as a global descriptor, we use the median of the short-term PRRC.

3 INFLUENCE OF DYNAMIC PROCESSING 2.1 Methodology

We want to assess the influence of dynamic processing on the features described in section 2. We start with the non-limited sub-corpus described in section 1. The tracks from this sub-corpus were processed using look-ahead brick-wall limiters and compressors. All plug-ins were hosted inside Reaper4_{. All tracks were normalized prior and} af-ter processing. We proceeded with the experiment using

4 _{www.reaper.fm/}

the following compressors: Waves H-Comp5_{, Sonnox} Ox-ford Dynamics6_{, Sonalksis SV-3157}7_{, and URS 1970}8_{, as} well as the following limiters: Waves L29_{, Sonnox Oxford} Limiter10_{, Thomas Mundt’s Loudmax}11_{, and Blue Cat’s} Protector12_{. Remarkably, all tested compressors led to} sim-ilar results, and so did the limiters.

As far as compressors are concerned, we only present the results for the Waves H-Comp. Such a choice is motivated by the “analog feel” of the compression algorithm, along with its predictability during use. The amount of compres-sion is varied by using six combinations of jointly changing ratio and threshold. We perform two processing sessions, using a fast attack (0.5 ms) in one case, and a slow attack (50 ms) in the other. All remaining settings are set to default (Punch to 0, Analog to 2, Release to 100 ms). There is no make-up gain; the file is normalized after processing. This leads us to two groups of 1470 tracks (210 original tracks, plug-in bypass + 6 compression settings).

As far as limiters are concerned, we only present the results for the Waves L2, which is the more widespread of the four limiters according to personal experience. The amount of limiting is varied by changing the threshold from 0 dB to−18 dB. All other settings are set to default (Release to “ARC Mode,” Quantization to “24 bits,” Dither to “Type 1,” Shaping to “Normal”). In this configuration, the make-up gain is the inverse of the compression gain. The file is normalized after processing. This leads us to a total of 4200

5_{www.waves.com/Content.aspx?id}₌₉₁₁₁ 6_{www.sonnoxplugins.com/pub/plugins/products/dynamics.} htm 7_{www.sonalksis.com/sv315.htm} 8_{www.ursplugins.com/urs1970.html} 9_{www.waves.com/Content.aspx?id}₌₂₁₁ 10_{www.sonnoxplugins.com/pub/plugins/products/limiter.htm} 11_{loudmax.blogspot.com/} 12_{www.bluecataudio.com/Products/Product Protector/}

(5)

Fig. 3. L2 limiter influence on RMS level depending on the threshold (top). H-Comp compressor influence on RMS level varying jointly ratio and threshold, with fast attack (middle) and slow attack (bottom). On this Figure as on all similar Figures, the center horizontal line represents the median, the two surrounding horizontal lines represent the upper and lower limits of the interquartile range. The outer whiskers represent the highest and lowest values that are not outliers. Outliers, represented by “+” signs, are values that are more than 1.5 times the interquartile range.

tracks, 20 tracks for each of the 210 songs from the reference corpus (1 with the plug-in bypassed + 19 processed). 3.2 Influence on RMS and EBU3341 Integrated Loudness

Figs. 3 and 4 illustrate the influence of limiting and com-pression on the overall RMS level and EBU3341 integrated loudness of the sub-corpus tracks. Limiting the signal raised the RMS and loudness until the threshold was lowered to ca.−15 dB, after which RMS and loudness tended to stabi-lize. Fast-attack compression also increased both features until a particular compression amount was reached, after which RMS and loudness started decreasing. Slow-attack compression had a different influence altogether, and de-creased both features from the start: the more slow-attack compression, the lower the level and loudness. These re-sults suggest that high levels and loudness values can only be reached using limiters, not compressors.

3.3 Influence on Crest Factor

Limiting the signal resulted in a crest factor decrease un-til the threshold was lowered to ca.−14 dB, after which the crest factor tended to stabilize (Fig. 5). Fast-attack

com-pression decreased crest factor values until a certain amount of compression was reached, after which these values re-mained stable. Slow-attack compression slightly increased crest factor values. These results show that low crest fac-tor values can be reached by using either a limiter or a fast-attack compressor.

In Fig. 6, we can see that limiting had no influence on LRA for thresholds higher than−5 dB. For higher nega-tive values, LRA slowly decreased. Fast-attack compres-sion also decreased LRA. The influence of slow-attack compression is more complex: at first, LRA values were stable, then decreased, then increased again when the com-pression amount was higher. These results show that low LRA values can be reached using any of the three process-ing methods (limitprocess-ing, fast-attack compression, slow-attack compression), but that the amount of processing applied must be large to obtain a significant reduction.

3.5 Influence on HLSD

Fig. 7 illustrates the influence of limiting and com-pressing on HLSD. As far as limiting is concerned, it

(6)

Fig. 4. L2 limiter influence on EBU3341 integrated loudness depending on the threshold (top). H-Comp compressor influence on EBU3341 integrated loudness varying jointly ratio and threshold, with fast attack (middle) and slow attack (bottom).

was possible to witness a similar phenomenon as with RMS, EBU3341 loudness and crest factor: HLSD increased until the threshold was lowered to ca.−14 dB, after which HLSD tended to stabilize. The influence of fast-attack and slow-attack compression was much less significant. Both fast and slow-attack compression produced a small decrease of HLSD (slow-attack compression having no influence on HLSD in the case of the H-Comp). These results show that high HLSD values can only be reached by the use of a limiter and that HLSD is a good predictor of the amount of limiting applied to the track.

3.6 Influence on PRRC

Similarly to what was witnessed with limiter influence on RMS, EBU3341 loudness, crest factor, and HLSD, limit-ing led to PRRC decrease until a particular limitlimit-ing amount was reached (−15 dB in this case), after which the PRRC appeared to be stabilizing (see Fig. 8). Fast-attack compres-sion led the PRRC to decrease in any case. The influence of slow-attack compression was more complex: the values were stable at first and then increased. These results show that low PRRC values can be reached both by the use of lim-iting and fast-attack compression and that PRRC is a good predictor of the amount of dynamic processing applied to a track.

3.7 Clipping

In the present section, we did not specifically study the influence of clipping on the different signal descriptors we use, even though it is sometimes seen as an important aspect of the loudness war [3] (p. 254). There are several reasons for this omission. First, as far as we can tell, there is not a single track in our corpus that uses clipping as a mastering technique per se. When clipping is heard, it is on specific, mostly percussive instruments such as the kick drum. This makes clipping a sound design technique, not a mastering problem. As such, it doesn’t belong to the scope of this study.

There are indeed some cases where the use of clipping on particular instruments does appear to be used to raise the general level of a track [3] (p. 280). These cases typi-cally correspond to specific music styles [3] (p. 281) [12] (p. 111), in which clipping goes alongside a heavily limited mix, making it hard to distinguish between the influences of clipping and hard limiting. This leads us to consider such clipping as brickwall limiting with a high distortion rate. In this regard, from the point of view of dynamics, clipping is a particular case of limiting.

3.8 Summary

The main finding of this experiment is a set of reference values for the behavior of sound descriptors in regards to

(7)

8 10 12 14 16 Crest factor (dB) 8 10 12 14 16 Crest factor (dB) 10 12 14 16 Crest factor (dB) L2 threshold (dB) (B) 0 (B) 1:1, 0dB 1:10, -28dB

Fig. 5. L2 limiter influence on crest factor depending on the threshold (top). H-Comp compressor influence on crest factor varying jointly ratio and threshold, with fast attack (middle) and slow attack (bottom).

limiting and compression. A second important finding is the proof that the proposed new features (PRRC and HLSD) actually measure what they were designed for. From these measurements, we see that it is possible to estimate the kind of dynamic processing used in a track and the parameters of each processor. This can be useful in the field of music information retrieval, for instance in the context of decade prediction (prediction of the decade in which a song was released). Indeed, as seen in section 4.1, a heavily limited track cannot have been produced before 1990. It can also be useful in the field of signal processing algorithm de-sign, for instance for automatic loudness normalization, to characterize the production of a track [37] or for sound engineers.

More specifically, we can seek out the answer to the following question: if one wants to reach unusual descriptor values, what kind of processing can one use? High RMS and EBU3341 integrated loudness values can be reached using a limiter. Low crest factor values can be reached using either a limiter or a fast-attack compressor. Low EBU3342 loudness range values can be reached using any one of the three methods, given that the amount of processing is significant. High HLSD values can be reached using a limiter. Low PRRC values can be reached using either a limiter or a fast-attack compressor. Conversely, we can deduce the following:

r _{A track with notably high RMS/loudness and/or HLSD} values has been limited.

r _{A track with notably low crest factor and/or PRRC values} but normal RMS/loudness and/or HLSD values has been compressed using a fast attack.

r _{A track with notably low EBU3342 values but normal} values otherwise has been compressed using a slow at-tack tracks.

4 EVOLUTION OVER THE YEARS (LOUDNESS WAR, 1989–2004)

On the basis of the previous reference measures and ob-servations, we can now analyze tracks in the corpus in terms of dynamic processing and observe the evolution of studio practices over the past 5 decades. In this aim, we observe the behavior of each descriptor according to the track’s date of release, 5 years by 5 years. For readability reasons, we will designate each group of 5 years by its central year. For in-stance, the 1967–1971 box will be referred to as “1969.” To provide context to the experiment, we provide a timeline of milestones in the evolution of studio practices. These milestones will prove to have a very strong influence on the musical signal.

(8)

0 5 10 15 20 EBU3342 lo u

dness range (LU)

0 5 10 15 20 0 5 10 15 20 EBU3342 lo u

dness range (LU)

L2 threshold (dB)

(B) 0

(B) 1:1, 0dB 1:2, -5dB 1:35, -44dB

H-Comp settings. Top, attack=0.5ms. Bottom, attack=50ms.

Fig. 6. L2 limiter influence on EBU3342 loudness range depending on the threshold (top). H-Comp compressor influence on EBU3342 loudness range varying jointly ratio and threshold, with fast attack (middle) and slow attack (bottom).

4.1 Studio Practice Timeline

We grouped technical innovations and studio practices into two periods: 1960s–1983 and 1985–1997. Elements from the first period all move toward better transient resti-tution and high-fidelity while changes and innovations from the second period, on the contrary, point toward a form of low-fidelity, heavy processing, and high loudness.

4.1.1 From the 1960s to 1983

r Sixties: The Beatles use “bouncing”13 _extensively [3] (p. 158).

r Late sixties: advent of eight-track machines [3] (p. 159). The more numerous the tracks, the less “bouncing” is necessary, the less degraded the transients.

r _{Ca. 1970: 16-track machines, solid-state technology [3]} (p. 160). Solid-state technology being more transpar-ent than the previously used tube technology, it leads to

13_{“Bouncing” is a technique that consists into emulating} multi-track recording with stereo recorders. A part will be recorded on track one, a second one on track two. Tracks one and two are “bounced” together to track one, so a third part can be recorded on track two. . .and so on. This technique goes at the price of transient degradation [3] (p. 161), thus leading to a potential loss of crest factor and increasing of PRRC.

clearer transients and potentially higher crest factor, and lower PRRC.

r _{Ca. 1981: end of the “bounce” practice [3] (p. 161).} r _{1983: Sony DASH multi-track digital recorders [38].}

Digital tape recorders are more transparent than analog tape recorders, leading again to clearer transients. r _{1983: advent of the CD [30] (p. 152), which possesses a}

higher signal to noise ratio than vinyl.

4.1.2 From 1983 to 1997

r Ca. 1985: first “all-in-one” radio processor, an important by-product of the “radio loudness war” [3] (p. 277). r _{1989: “lo-fi” indie bands [3] (p. 181). The “lo-fi” trend}

implies the use of vintage gear, which degrades tran-sients.

r _{1991–1994: radio processors find their way into the} stu-dio [3] (pp. 279–280). This makes high loudness easier during production.

r _{Ca. 1997: DAWs (Digital Audio Workstation(s)) are} pre-ferred over tape systems [3] (p. 294). Makes audio pro-cessing and more particularly high loudness even easier: all you need to do is to select a plug-in.

4.2 RMS, Loudness, HLSD

As seen in Fig. 9 and 10, RMS and loudness evolve similarly: slight decrease between 1969 and 1984, steady

(9)

HLSD

0

Fig. 7. L2 limiter influence on HLSD depending on the threshold (top). H-Comp compressor influence on HLSD varying jointly ratio and threshold, with fast attack (middle) and slow attack (bottom).

increase between 1984 and 2004, and finally slight decrease between 2004 and 2009. This suggests that the loudness war can be situated between the end of the 1980s and the mid-2000s. As seen in Fig. 11, this is confirmed by the evolution of the HLSD. A year-by-year analysis of the three descriptors shows that the loudness war can be situated more precisely between 1988/1990 and 2004/2007.

A joint observation of Figs. 9–12 indicates that while RMS level and loudness have been decreasing between 1969 and 1984, HLSD has not. According to the conclu-sions drawn in section 3.8, such a combination can be as-sociated to a decreasing use of fast-attack compression in the production chain. Such a change in recording practice may be explained by the fact that when using analog sys-tems with a low signal-to noise ratio, according to personal experience, audio engineers will resort to two solutions. First, they will try to record at a high level so background noise is minimized. This will result in transients whose level neighbors the upper capabilities of the recording sys-tem, in which case its behavior will not be linear anymore. An input increase of say 6 dB will result into a transmitted increase of less than 6 dB. This phenomenon is none other than fast-attack dynamic compression. Then, they will want to process the signal prior to recording so that transients are less salient. This also means using fast-attack compression, since no digital limiters were available during the 1960s. Increase of signal-to-noise ratio between the 1960s and

1984 may therefore very well result into less fast-attack compression.

4.3 Crest Factor

The crest factor shows a similar evolution (see Fig. 12), with the notable exception that it keeps decreasing between 2004 and 2009. The global decrease in crest factor due to the loudness war is about 2.0 dB, which corresponds, according to Fig. 5, to the average crest factor loss that would be encountered when setting the Waves L2’s threshold slider near−6.5 dB.

As seen in Fig. 13, the EBU3342 loudness range follows a completely distinct evolution, and does not appear to be significantly influenced by the loudness war. We had reached similar results using a smaller corpus in [18], and it was later confirmed with a much larger one in [5]. If we accept the EBU3342 LRA as a measure for macro-dynamics, then the usual preconception according to which the loudness war is responsible for making all mainstream music parts sound equally loud [1],[2] is wrong, for the simple reason that recent mainstream music is nearly as “macro-dynamic” as it was 30 years ago.

Fig. 14 shows log(1 + LRA) instead of LRA. This re-sults in very few outliers and a normal distribution. This

(10)

0 0.2 0.4 0.6 0.8 1 1.2 PRRC 0 0.2 0.4 0.6 0.8 1 1.2 PRRC 0.5 1 1.5 2 PRRC L2 threshold (dB) (B) 0 (B) 1:1, 0dB 1:2, -5dB 1:5, -18dB 1:10, -28dB

H-Comp settings. Top, attack=0.5ms. Bottom, attack=50ms.

Fig. 8. L2 limiter influence on PRRC depending on the threshold (top). H-Comp compressor influence on PRRC varying jointly ratio and threshold, with fast attack (middle) and slow attack (bottom).

Year of Release

RMS (dBFS)

Fig. 9. RMS level evolution. On this Figure as on all similar Figures, the center horizontal line represents the median, the two surrounding horizontal lines represent the upper and lower limits of the interquartile range. The outer whiskers represent the highest and lowest values that are not outliers. Outliers, represented by “+” signs, are values that are more than 1.5 times the interquartile range.

(11)

HLSD

Year of Release Fig. 11. HLSD evolution.

suggests that the EBU3342 loudness range could be ex-pressed in log (LU) instead of LU. The evolution graph is also more readable this way. We note that macro-dynamics as measured by the LRA have been generally decreasing between 1969 and 1989, and then increasing for the first 15 years of the loudness war. They decrease again between 1999 and 2004, and increase once more between 2004 and 2009. Overall, it is impossible to observe a regular trend in the evolution of the LRA during the period of the study. Such a resilience of the LRA to the loudness war might be related to its resilience to dynamic processing as observed in section 3.4.

4.5 Summary

We now sum up the main observations we have been making about the signal during this section.

1) The loudness war can be seen as starting near 1989, and peaking near 2004. Music level appears to be stable since 2004.

2) There is a local minimum of loudness during the 1980s.

3) If we compare mainstream music from before the loudness war with mainstream music from during and after it, the loudness war is responsible for an RMS level increase of 4.0 dB, a loudness increase of 3.1 LU, and a crest factor decrease of 1.4 dB.

4) If we look for descriptor variations between 1989 and 2004, then the loudness war can be accountable for an RMS level increase of 5.0 dB, a loudness increase of 4.0 LU, and a crest factor decrease of 2.0 dB. 5) In order to limit music tracks from before the loudness

war so that they sound as loud as music released near 2004, the limiter threshold should be set near−6 dB.

6 8 10 12 14 16 Crest Factor (dB) Year of Release 1967-1971 1972-1976 1977-1981 1982-1986 1987-1991 1992-1996 1997-2001 2002-2006 2007-2011

Fig. 12. Crest factor evolution.

10 5 0 15 20 EBU 3342 Lo u

dness Range (LU)

1967-1971 1972-1976 1977-1981 1982-1986 1987-1991 1992-1996 1997-2001 2002-2006 2007-2011 Year of Release

(12)

0 1 2 3 log(1+EBU 3342 Lo u dness Range) 1967-1971 1972-1976 1977-1981 1982-1986 1987-1991 1992-1996 1997-2001 2002-2006 2007-2011 Year of Release

Fig. 14. Evolution of log(1 + LRA). These observations are consistent with the technical

timeline provided in section 4.1, according to which there was an increasing use of limiting between the beginning of the 1990s and 2004. The existence of “lo-fi” indie bands near 1989 suggests that the growing taste in degraded audio signals that can be observed after 1984 may not be solely the loudness war’s responsibility, but also the result of an un-derlying stylistic tendency. This strongly suggests that the loudness war is a DAW-related phenomenon (easy plug-in plug-instantiation) that adds up to an underlyplug-ing, legitimate, stylistic/technical trend (low fidelity).

Fig. 15 shows these events along with the PRRC, whose evolution nicely sums up the observations made in sec-tion 4. In 2008, a series of articles are published in main-stream journals [1],[16],[39], that unanimously expose the use of what they refer to as compression” or “hyper-limiting.” Having such considerations published in the mainstream press is consistent with the observation accord-ing to which the loudness war has been peakaccord-ing four years before.

We can sum up the timeline by dividing it into four eras. 1967-1971 1972-1976 1977-1981 1982-1986 1987-1991 1992-1996 1997-2001 2002-2006 2007-2011 Year of release 0 0.5 1 1.5 PRRC

ca. 1985: first radio processor1991-1994: radio-like processors in the st

udio

ca. 1997: DAWs are preferred over tape systems

1983: Sony DASH m

ultitrack digital recorders

ca. 1970: 16-track machines, solid-state

late sixties: eight-track machines 2008: Lo

udness War catches

media attention

(13)

(1) 1967–1984: music production tended toward high-fidelity and transparency, thanks to technological in-novations that preserve transients. Notice the high pro-portion of tracks featuring a PRRC greater than 1 near the beginning of the 1980s, which is consistent with the wide use of noise gates around that time [3] (pp. 165– 170).

(2) 1984–1989: an opposite tendency towards low-fidelity can be observed, even though no limiters were used yet. It has apparently nothing to do with tech-nological changes, and seems to be a purely stylistic issue.

(3) 1989–2004: loudness war era, limiters were increas-ingly used.

(4) 2004–2011: loudness war awareness and stabiliza-tion, all descriptor evolutions but the crest factor’s were slightly reversed.

5 CONCLUSIONS

Several signal descriptors in relation to music level and music level variation have been considered. Among these, the crest factor appears to be of particular concern, an ob-servation that is also made by [11]. It does not appear to measure the amount of dynamic processing applied to the signal, which leads us to suggest another signal descrip-tor for that purpose, the “Peak to RMS regression coeffi-cient,” or PRRC. Then, the crest factor may not be a good measure for micro-dynamics, that is, short-term dynamic variability, given that this particular concept is even mean-ingful. Indeed, whereas substantial effort from TC Electron-ics and the EBU’s part makes it possible to use standardized descriptors in relation to macro-dynamics [7,11], no such work exists in regards to micro-dynamics. In our opinion, this is a concept that should be clarified. More generally, when it comes to music level variation, we think that the very concepts should be defined much more carefully than they currently are. This is particularly urgent in the case of the “dynamic range,” which we believe should be put aside altogether.

The most significant findings of the article concern the evolution of the suggested descriptors on a corpus spanning over the last five decades. As a result, we observe that from 1967 to 1984, mainstream music production tends toward high-fidelity and transparency, thanks to different techno-logical innovations. From 1984 to 2004, on the contrary, it tends toward low-fidelity and transient degradation. This tendency appears to be starting as a purely aesthetic issue, then turns near the end of the 1980s into the loudness war, in which limiters play a very important role. The loudness war appears to peak in 2004, and a modest movement toward the opposite direction can be observed.

While the loudness war has indeed made mainstream music louder, transients less salient, decreased “natural-ness,” macro-dynamics remain practically untouched. In other words, there are still pianissimi and fortissimi in re-cent mainstream music, a conclusion that we provided in [18] and that was confirmed by [5]. Therefore, the origin for the “ear fatigue” phenomenon that is sometimes associated

to modern music [40],[41] does not lie with the absence of musical dynamics. Instead, it may be related to micro-dynamics, but this is a concept that is so poorly defined, that it seems a long way before one can find solid relations between the notion of micro-dynamics, a precise set of au-dio descriptors, and a well-defined percept. We hope that this article will contribute to the ongoing debate.

6 REFERENCES

[1] K. Mastersonn, “Loudness war stirs quiet revolution by audio engineers,”Chicago Tribune, 2008.

[2] S. Sreedhar, “The future of music,”IEEE Spectrum, August 2007.

[3] G. Milner,Perfecting sound forever: An aural history of recorded music. Faber & Faber, 2009.

[4] E. Vickers, “The loudness war: Background, spec-ulation and recommendations,” inAES 129th Convention, San Francisco, CA, pp. 4–7, 2010.

[5] J. Serrà, Á. Corral, M. Boguñá, M. Haro and J. Arcos, “Measuring the evolution of contemporary western popular music,”Scientific Reports, vol. 2, 2012.

[6] E. Skovenborg and T. Lund, “Loudness descriptors to characterize programs and music tracks,” Proceedings of the AES 125th Convention, 2008.

[7] N. B. H. Croghan, K. H. Arehart, and J. M. Kates, “Quality and loudness judgments for music subjected to compression limiting,”Journal of the Acoustical Society of America, vol. 132, no. 2, pp. 1177–1188, August 2012.

[8] “EBU - TECH 3341. Loudness metering: ‘EBU mode’ metering to supplement loudness normalisation in accordance with EBU R 128.”EBU/UER, August 2011.

[9] ITU, “BS.1770-2: Algorithms to measure audio pro-gramme loudness and true-peak audio level,” 2011.

[10] EBU/UER, “EBU - recommendation R 128. Loud-ness normalisation and permitted maximum level of audio signals,” 2011.

[11] E. Skovenborg, “Loudness range (LRA) – design and evaluation,”AES 132nd Convention, Budapest, Hun-gary, 2012.

[12] B. Katz and R. Katz,Mastering audio: the art and the science. Focal Press, 2007.

[13] P. M. Foundation, “Our aim.” Retrieved (2011 July) from dynamicrange.de/en/our-aim.

[14] F. Sound and P. Development, “Pure analyzer metering option.” Retrieved (2011 Oct.) from flux-home.com/products/analyzer.

[15] E. Vickers, “Metrics for quantifying loud-ness and dynamics.” Retrieved (2011 Oct.) from http://www.sfxmachine.com/docs/loudnesswar/metrics for quantifying loudness and dynamics.pdf, 2010.

[16] E. Smith, “Even heavy-metal fans complain that today’s music is too loud!!!,”Wall Street Journal., pp. 9– 26, September 2008.

[17] R. Rowan, “Over the limit,”ProRec.com, 2002. [18] E. Deruty, “Dynamic range and the loudness war,” Sound on Sound, pp. 22–24, September 2011.

(14)

[19] E. Zwicker, H. Fastl, U. Widnmann, K. Kurakata, S. Kuwano, and S. Namba, “Program for calculating loudness according to DIN 45631 (ISO 532b),” Journal of the Acoustical Society of Japan (E), vol. 12, no. 1, pp. 39–42, 1991.

[20] B. Bauer, E. Torick, A. Rosenheck, and R. Allen, “A loudness-level monitor for broadcasting,” Audio and Electroacoustics, IEEE Transactions, vol. 15, no. 4, pp. 177–182, 1967.

[21] ISO, “532, acoustics - method for calculating loud-ness level,” International Organisation for Standardisa-tion, 1975.

[22] B. Jones and E. Torick, “A new loudness indicator for use in broadcasting,”SMPTE Journal, vol. 90, no. 9, pp. 772–777, 1981.

[23] B. Moore and B. Glasberg, “A revision of Zwicker’s loudness model,” Acta Acustica united with Acustica, vol. 82, no. 2, pp. 335–345, 1996.

[24] M. Stone, B. Moore, and B. Glasberg, “A real-time DSP-based loudness meter,”Contributions to Psychologi-cal Acoustics, 1997.

[25] Pleasurize Music Foundation, “Unofficial dynamic range database.” Retrieved (2011 Aug.) from dr.loudness-war.info.

[26] EBU/UER,“EBU-TECH3342 Loudness Range: A descriptor to supplement loudness normalisation in accor-dance with EBU R 128,” August 2011.

[27] Wikipedia.org, “List of best-selling albums world-wide.” Retrieved (2011 Mar.) from en.wikipedia.org/wiki/ Best selling albums.

[28] Besteveralbums.com, “Best ever albums.” Re-trieved (2011 Mar.) from besteveralbums.com.

[29] Billboard.com, “Hot 100 chart archives.” Retrieved (2011 Mar.) from besteveralbums.com.

[30] The Cambridge companion to recorded music. Cambridge University Press, 1999.

[31] E. Skovenborg and S. Nielsen, “Evaluation of dif-ferent loudness models with music and speech material,” Proceedings of the AES 117th Convention, 2004.

[32] Telecommunications: Glossary of Telecommunica-tion Terms. National Communications System, 1996.

[33] T. E. S. Engine, “About vu meters.” Retrieved (2011, Sept.) from www.globalspec.com/learnmore/ test measurement equipment/multimeters electrical test me-ters/vu meters.

[34] “VU and PPM audio meters: an elementary expla-nation.” Shure Technical Library, May 2007.

[35] International Electrotechnical Commission, “IEC 60268-17: Sound system equipment - part 17: Peak pro-gramme level meters,” October 1990.

[36] International Electrotechnical Commission, “IEC 60268-10: Sound system equipment - part 10: Peak pro-gramme level meters,” March 1991.

[37] D. Tardieu, E. Deruty, C. Charbuillet, and G. Peeters, “Production effect: Audio features for recording techniques description and decade prediction,”Int. Confer-ence on Digital Audio Effects (DAFx-11), (Paris, France ), September 2011.

[38] “Technology hall of fame: 1981 Sony PCM-3324.”Mix Magazine, September 2006.

[39] S. Michaels, “Death magnetic ‘loudness war’ rages on,”The Guardian, October 2008.

[40] E. Vickers, “The Loudness War: Do Louder, Hy-percompressed Recordings Sell Better?,” Journal of the Audio Engineering Society, vol. 59, no. 5, pp. 346–351, 2011.

[41] R. Levine, “The death of high fidelity,” Rolling Stone, December 2007.

THE AUTHORS

Emmanuel Deruty Damien Tardieu

Emmanuel Deruty is graduated from the Conservatoire de Paris (CNSMDP), Paris, France. He has worked as a sound designer for IRCAM, Paris, France and Soundwalk, New York, NY, as a film music composer in Europe and in the US, as a writer for the Sound on Sound magazine, Cam-bridge, UK, and as a consultant in musicology applied to M.I.R. for INRIA, Rennes, France, as well as for IRCAM, Akoustic Arts and Sony Computer Science Laboratory,

Paris, France.

r

Damien Tardieu is cofounder and CEO of Niland, a company that sells music recommendation systems. He has been working as a researcher at IRCAM in Paris, France, and at the Institute for New Media Art in Tech-nology in Mons, Belgium. He holds a PhD in music technology and a master’s degree in signal processing and computer science applied to music from the Paris VI university, as well as a masters degree in communi-cation systems from the Ecole Polytechnique de Lausanne, Switzerland.