• No results found

Methods for Analysing Large-Scale Resources and Big Music Data

N/A
N/A
Protected

Academic year: 2021

Share "Methods for Analysing Large-Scale Resources and Big Music Data"

Copied!
71
0
0

Loading.... (view fulltext now)

Full text

(1)

Methods for Analysing Large-Scale

Resources and Big Music Data

Tillman Weyde

Department of Computer Science Music Informatics Research Group City University London

(2)

Overview

!  Large-Scale Music Analysis !  Big Music Data

!  Available technologies and methods

!  The Digital Music Lab !  Architecture

!  Data

!  Example

!  Chord recognition

!  Chord sequence analysis

!  Visualisations

!  Hands-on tasks

!  Exploring British Library content

(3)

The Digital Music Lab Project 3

Large Scale Music Analysis

!  Music has gone digital on a large scale. !  What about musicology?

!  What is different about large scale data

(4)

Large Scale Music Analysis

! 

Big Music Data Collections become

increasingly available digitally, e.g.

!  iTunes and Spotify (and others) offer over 30 m tracks each

!  British Library holds ~ 3 millions of audio recordings (~10% are digitized), 1.5 million music prints, 100k manuscripts

!  Internet archive: 2.5 m audio tracks

(5)

Large Scale Music Analysis

!  Big Music Data in Research

!  Systematic Musicology has developed as

"data oriented empirical research"

Parncutt, R. 2007. Systematic musicology and the history and future of Western musical

scholarship. Journal of Interdisciplinary Music Studies, 1, 1-32.

!  “... working with larger data sets will open

up new areas of musicology.”

N. Cook (2005). ‘The Compleat Musicologist’. Keynote speech at ISMR.

(6)

Big Data Analysis Workflow

!  Acquisition, Storage and Management

!  Large Hardware & Software Systems

!  Exploration, Hypothesis development

!  Query & Search, Visualization

!  Modelling & Testing

!  Statistical tools

(7)

Big Data Analysis Technology

!  Parallel processing on large arrays

!  Cheap unreliable hardware

!  Software Architecture

!  Map/Reduce, Computation Graphs

(Hadoop, Spark)

!  Algorithms

!  Failure-tolerant, efficient, simple

!  Visualisations

(8)

Big Music Data Applications (1)

8

!  Popular:

!  Animated

!  Recommendation (network based)

Methods for Analysing Big Music Data

(9)

Big Music Data Applications (2)

9

!  Not many: e.g. music history graph by Google

(since 1950, not classical music )

!  https://research.google.com/bigpicture/music/

(10)

The Digital Music Lab Project 10

Digital Transformations in Musicology

!  Challenges

!  Gap between musicology and music technology

(music information retrieval)

!  Large heterogeneous data collections

!  Need for software infrastructure

!  Audio and symbolic music processing

!  Connecting resources (semantic web, linked

data)

!  Tools and visual interfaces

(11)

The Digital Music Lab Project 11

Open Questions (partly)

!  How can music research use audio

transcription and analysis on large data

collections?

!  How can we provide an infrastructure that

enables researchers to make use of large data

collections and create reusable open

datasets?

!  How can computational tools be made usable

for music researchers, musicians and other

users (who are not necessarily computer

(12)

12

Musicological Questions

From 2014 Workshop

!  Analysing styles, trends over time

!  Work across different heterogenuous collections

!  Utilise external metadata and annotations

(13)

Infrastructure needed

13/03/2015 13

!  Feature Extraction

!  Vamp and other plug-ins

!  Parallelisation

!  Middleware

!  Semantic Web

!  Music Ontology

!  Aggregation and collection level analysis

(14)

Large Scale Music Analysis

!  Plan for this session

!  Show Digital Music Lab and our approach to

large scale music analysis

!  Workflow and technologies

!  Present features and interfaces

!  Hands-on data exploration

!  Methods for further analysis

!  Discussion

(15)

The Digital Music Lab

(16)

The Digital Music Lab project

16

!  January 2014 - March 2015

small follow-up project running now

!  City University (Dpt of Computer Science,

Dpt of Music)

!  Tillman Weyde, Stephen Cottrell, Jason Dykes,

Emmanouil Benetos, Daniel Wolff, Dan Tidhar, Alex Kachkaev

!  Queen Mary UoL (Centre for Digital Music)

!  Mark Plubmley, Simon Dixon, Mathieu Barthet, Steven

Hargreaves

!  University College London (Dpt of Computer

Science, Centre for Digital Humanities)

!  Nicolas Gold, Samer Abdallah

!  British Library (BL Labs)

!  Aquiles Alencar-Brayner, Mahendra Mahey, Adam Tovell

(17)

17 13/03/2015

Goals

!  Develop a networked infrastructure to

bring computation to the data

!  Avoid copyright problems by design

!  Integrate audio feature extraction and

transcription

!  Development of analysis tools

!  Interactive visual interfaces

!  Musicological applications

(18)

18 13/03/2015

Outputs

!  Curated datasets and derived data

(>4 Terabytes)

!  Web service with visual interfaces for

data exploration

!  Publications (more to come)

!  Redistributable virtual machine images (in

preparation)

(19)

The Digital Music Lab

!  Overview

(20)

The DML System Provides ...

20

!  Access: Systematic exploration of

heterogenuous and large music libraries

!  Control: Interfacing with complex

automatic music analysis tools

!  Analysis: Gain summarised knowledge on

large numbers of recordings

!  Sharing: Experiments reproducible with same data, clear

provenance of analysis results.

(21)

The Technical Perspective

21

!  Access to data

!  Audio – access restricted by physical location

!  Metadata – unification of different formats

!  Control via web interface to large-scale analysis !  Interactive UI for overview and exploration

!  Scalable analysis is available on collection-level

and recording-level

!  Share the well-defined and derived data

!  Re-use of existing software and published code for analysis

(22)

Software Ecosystem

22

!  Distributed system

!  Virtual machines (VirtualBox)

!  Open Source OS (Ubuntu)

!  Parallelised existing analysis tools

!  Python (NumPy)

!  Vamp Plugins

!  Big-Data map-reduce (Spark)

!  Computation management

!  Built on semantic architecture

!  Interactive user interface for exploration and analysis

!  Built using state-of-the-art web technologies

(23)

Data-Flow for Computational Analysis

Methods for Analysing Big Music Data 23 User Interface Web Server Provide Analysis Management: Cliopatria Database: Results & Metadata Computing Server Audio, Transcriptions and

Feature Storage Access Audio

(24)

Physical Locations Matter: Content Access

24

!  Two computing servers, located at BL and ILM

!  Allow for in-place access to restricted data

!  Dedicated server at City for web access

(25)

Sustainability

25

!  Preference on Open Source

!  Basic infrastructure (Ubuntu, Spark, Vamp ...)

!  Soundsoftware repository for

!  Publishing versioned code of newly developed software

!  Backup and sharing: Open data / features / results

!  Open and reproducible method

!  Enables similar set-up in further institutions

(26)

Results Implemented in the DML System

26

!  Conceptual framework (including imp-

lementation) for collection-level analysis

!  Collection in focus as object of analysis

!  Data-flow for interactive retrieval of results

!  Secure, responsive and redundant network structure

!  Distributed computation ressources

!  Open-source software ecosystem for large-scale music analysis

!  Parallelised feature extraction and results management !  Collection-level analysis, interface and visualisation

(27)

Feature Extraction

(28)

Audio Descriptors List

1.  Spectrogram 2.  MFCCs 3.  Chroma 4.  Onsets 5.  Speech/Music Segmentation 6.  Chords 7.  Beats/Tempo 8.  Key 9.  Melody 10.  Note Transcription
(29)

Raw Audio

29

1.  B

Sample from CHARM: JS Bach, Chorale Prelude - Beloved Jesus, Cohen, Harriet (piano), Columbia, 1935

(30)

Spectrogram

2 versions:

!  STFT magnitude spectrogram

!  Constant-Q Transform magnitude spectrogram

(31)

MFCCs

31

!  Stand for: Mel-Frequency Cepstral Coefficients

!  Extracted using QM Vamp Plugin Set

(32)

Chroma

32

!  Spectrum projected onto 12 bins (representing semitones of an octave)

!  Extracted using: QM Chromagram and NNLS Chroma Vamp plugins

(33)

Onsets

33

!  Onset: the beginning of a musical note or another sound

!  Extracted using QM Onset Vamp plugin

(34)

Speech/Music Segmentation

34

!  Useful for ethnographic recordings/radio broadcasts

!  Extracted using BBC Speech/Music Segmentation Vamp Plugin

(35)

Chords

35

!  Extracted using Chordino Vamp Plugin

(36)

Beats

36

!  Beat locations labelled with metrical position

!  Extracted using Beatroot, Marsyas, Tempotracker Vamp Plugins

(37)

Tempo

37

!  Estimated based on onset/beat information

!  Extracted using Tempotracker and Tempogram Vamp plugins

(38)

Key

38

!  Extracted using QM Key Vamp plugin (supports major/minor keys)

(39)

Melody

39

!  Or more precisely: “Sequence of fundamental frequency (F0) values

corresponding to the perceived pitch of the main melody.”

!  Extracted using MELODIA Vamp plugin

(40)

Note Transcription – Semitone Resolution

40

!  Multiple-pitch detection (onset/offset/pitch/velocity)

!  Extracted using Silvet Vamp plugin

!  Synthesized transcription example:

(41)

Note Transcription – High Pitch Resolution

41

!  Multiple-pitch detection on a 20-cent resolution – useful for tuning/

temperament analysis and analysis of non-Western music

!  Extracted using Silvet Vamp plugin

(42)

Data

!  British Library !  CHARM

!  I Like Music

(43)

Data - British Library Music Dataset

43

!  Currently identifying, organising, and curating

available music data collections from the BL Sound Archive

!  Over 3M digital audio recordings, in a variety

of formats

!  Copyright-cleared material will be made

available to the public

!  Copyright-restricted material will be accessible

to BL users

(44)

Data – I Like Music Dataset

44

!  I Like Music: digital music service provider to

companies who hold public performance licences

!  Sole provider of online music to the BBC !  Holds a commercial music library of 1.2M

tracks and a production music library of 400k tracks

(45)

Data – CHARM and Mazurka Dataset

45

!  CHARM: AHRC Research Centre for the

History and Analysis of Recorded Music (2004-2009)

!  CHARM Dataset: 5k copyright-free historical

recordings (1902-1962) + metadata

!  Mazurka Dataset 3k recordings + metadata !  Ideal for musicological analysis using

computational methods

(46)

Extracted features available today

46

!  ILM, BL, CHARM datasets ~350.000 tracks !  Transcriptions MIDI and high resolution !  Beats and tempo curves

!  Chroma

!  Chord and key

(47)

47

Infrastructure

• 

Feature Extraction

•  Vamp plug-ins

•  Spark and other techniques for parallelisation

• 

Middleware

•  Semantic Web server (RDF with Prolog using ClioPatria)

•  Music Ontology

•  Manages aggregation and collection level analysis

•  Provides SPARQL endpoint

(48)

48

Infrastructure

• 

Derived data from 2 collections

!  Accessible via the web

(49)

49

Interfaces and Visualisations

• 

Audio collections

• 

Chord sequence patterns

• 

Tag crowd-sourcing

(50)

50

Studies

!  Temperament !  Chord progressions Lehman Kellner scmtFD fcmtGE Vallotti Just ET 1930-1979 1980-1989 1990-1999 2000-2009 2010-2014 0 0 .1 0 .2 0 .3 0 .4 0 .5
(51)

Examples

!  Chord Sequence Analysis

(52)

Large-Scale Analysis of Chord Sequences

52

!  Extracted chord sequences (e.g. Am7/E7/Gmaj7,

etc...)

!  On ILM's commercial music collection (1m tracks )

!  Parallel processing of multiple music clips

!  Chordino Vamp plugin (Queen Mary University of

London)

!  6 weeks on 8 core virtual machine

!  Retrieve most frequent chord patterns using

Sequential Pattern Mining (SPM)

!  In specific genre subsets (classical, folk, jazz, blues, rock,

reggae)

!  Chord pattern graphs visualised with open source

graphviz

Barthet, M. et al. Big Chord Data

(53)

53

Audio-based Automatic Chord Recognition

• Chordino Vamp plugin [Mauch and Dixon, ISMIR 2010].

•  Uses chromagram obtained with a non-negative least

squares (NNLS) procedure for approximate note transcription.

• Accuracy of 80% when assessed with MIREX 2009 s

dataset (Popular songs from the Beatles and Queen).

Excerpt from Buddy Guy s Mary Had A Little Lamb

(54)

Chord Sequence Patterns

54

!  Segmentation of audio recordings into chord

sequences

!  Representation of no detection (N)

!  Most frequency of (non-contiguous) chord

sequences are patterns

(55)

Chord Pattern Length

55

• Bell-shaped curve for all genres

•  Jazz and Classical have more patterns: greater harmonic diversity?

•  Folk has more long patterns?

22000

150 2000

12

(56)

Frequent Chord Patterns: Classical vs. Blues

56

• Classical patterns more distributed

•  Blues connects dom7 with dom7, classical doesn’t (dom7 is typically

resolved to major or minor tonic)

Classical

Blues

(57)

Visualisation of Chord Sequence Patterns

57

!  Most prominent chord sequences

!  Compare two collections or visualisations

Kachkaev, A. et al. Visualising Chord

(58)

Circular Grid

(circle of fifths, straight and twisted)

(59)

Parallel coordinates (chord types, circle of fifths)

(60)

Transition matrix (chord types, roots)

(61)

Tonnetz

(62)

Folk vs. Jazz

(63)

Demo

(64)

Practical exercises

64

!  Open http://mirg.city.ac.uk/dml-vis/ (similarity)

or http://dml.city.ac.uk/vis/ (tempo curve)

!  Follow the worksheet or explore freely

!  Pitch (class) distribution

!  Similartiy

!  Tuning

!  Tempo curves

!  Take notes of results, ideas and hypotheses !  You can explore the chord sequences here:

http://dml.city.ac.uk/chordseqvis/

(65)

Practical exercises

65

!  Tasks 1

!  Schoenberg has relatively flat pitch profile

(66)

Practical exercises

66

!  Tasks 2

!  Mozart sonatas seem more homogeneous

(67)

Practical exercises

67

!  Tasks 3

!  Tuning was stable between these periods

(68)

Practical exercises

68

!  Tasks 4

!  Tempo smoothes out, some final ritardando

(69)

Where to go from here …

69

!  Found something interesting?

!  Results need careful interpretation wrt

!  Noisy data

!  Data collection

!  Significance

!  Cross-validation

!  Meaning …

!  Can be challenging, needs expertise in music and

data

(70)

Where to go from here … (ctd.)

70

!  Follow up ideas and hypothesis by close

examination of the data

!  Extract data with SPARQL interface (more on

Semantic Web and SPARQL tomorrow)

!  Use programming languages and statistical tools,

e.g. Python, Pandas, SciPy, R, Matlab, SPSS, …

(71)

The end

71

!  Open discussion !  Thank you …

References

Related documents

This study covers a wide range of large-scale knowledge resources: WordNet (WN) (Fell- baum, 1998), eXtended WordNet (Mihalcea and Moldovan, 2001), large collections of

Big Data for weather forecast is actually extremely large data sets holding the current and previous weather reports that are used for analysing computationally and

Piyush Mehrotra, “Merging Big Data and HPC for Large-scale Analysis/Analytics at the NASA Advanced Supercomputing (NAS) Division,” Workshop on Big Data and Extreme-Scale Computing

The large scale data analysis and finding similarity between documents is made possible by a combination of big data technology such as Hadoop MapReduce and information

• Powerful (Big) Data processing tools and methods that demonstrate their applicability in real-world settings, including the data experimentation /integration (ICT-14) and Large

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies.[1] Big Data

We have attempted to propose the big data privacy models using data masking methods, this work will help the big data engineers and big data scientist and predominantely analyst as

• Shared nothing, massively parallel processing, scale out architectures are well-suited for big data apps.. Big