fred voisin’s website

computer music producer, since 1989

Playing Live Music Using Artificial Neural Networks

Conference at NeuroArts Festival, Plymouth University, february 11th, 2011.conference at NeuroArts Festival, Plymouth University, feb. 11th 2011

Following some requests after the NeuroArts Festival at Plymouth University, feb. 11th 2011, here is my talk rewriten from my notes, augmented with some details, including contents from slides and direct links to the cited works.
Thank’s to Eduardo Miranda who invited me and to the chairman Roy Ascott for his encouragement :)

Background : Experimental Ethnomusicology

My very first interest in musicology was the music produced by people all around the world. From 1989 to 1995, I learned a lot with the study of some cognitive processes involved in the musical scales and tunings in Central Africa and Java. These studies helped me to develop some considerations on musical pertinence in relation to technology, culture and mental representation of music (this work was conducted with the ethnomusicologist Simha Arom).
The use of MIDI computer and synthesizers on the spot, with some Oubanguian musicians, Aka pygmies and, later, Javanese Gamelan makers made the scientific modeling process to be an interactive experience based on digital technology we specially adapted to these cultures.

Experimental Ethnomuiscology
Experiment on the musical scales with Aka Pygmies, Central African Republic, 1990

Ethnomusicology points out unsuspected mental representations of music which are both private representations and public representations, so that music has to be studied as a cognitive process with some knowing-how and implicite knowledges, within a context which is both nature and society. This is the role of ethnomusicology to explicit such knowledges. It requires to develop many psycho-acoustical experiments in different cultures, and the investigation may take place into an interactive loop of modeling knowledges. To my point of view, the question of the so-called “universals of music” is less urgent than the one of the diversity of mental representations of music.

  • more about experimental ethnomusicology

    My First Experiments with Artificial Neural Networks

    When I joined Ircam as a computer music producer in 1995, I had more or less the same approach within an european context of “contemporary” music production : developing original tools for new and particular conceptions of music.
    In this context, from 1999 to 2001, I developed with Kasper Toeplitz and Myriam Gourfink a new approach for composing dance scores, based on an analysis of the Laban notation in relation to Gourfink’s choreography (cf. LOL), searching a way to make the computer able to learn implicite rules for choreography. Despite artificial neural networks were not experimented at Ircam, I began with some curiosity to experiment with. I was facinated to make machines able to learn without any ‘logic’ but only with implicite rules. I did a first simple application to experiment by myself this new computing paradigm in 2001 at Ircam with « L’écarlate », by Tœplitz & Gourfink.

L’écarlate (rehearsal)
L’écarlate, by Myriam Gourfink & Kasper Tœplitz, Ircam 2001

In « L’écarlate », our work on choreography was related to music composition at the level of formalisation and notation. We experimented some artifical neural networks for dance score notation, not for some computer vision or gesture recognition : on stage, all the dance’s gestures were transcribed ‘à la volée’ by the dance notator (Laurence Marthouret) via a computer interface specially adapted to transcribe gestures :

Computer interface for dance transcription

Then, these transcriptions by hand was converted to vectors to be analysed in real-time with a Learning Vector Quantizer (LVQ), to make a comparision of the actual gestures on stage with the ones written on the dance score (see below : upside is an encoded gesture from score, and at bottom the LVQ recognition from stage with blue neurons showing errors) :

Learning Vector Quantizer analysis of some dance gesture

It is important to note that, in such an artistic perspective, the errors detected by the LVQ give some relevant information about the dancer’s performance on stage, as well as the difficulty for the LVQ to learn the score - here the error has a particular meaning, it make sense. In « L’écarlate », these deviations were encoded as stimuli (input) to train a recurrent Multi-Layer Perceptron (MLP) for playing a simple musical score with some sinusoidal sound generators (output) :

MLP with internal feedback for L’écarlate
  • more about « L’écarlate »

    On Computer Learning and Music

    At this point, I may question the aims of using computer learning and neural networks in relation to music production. In my point of view, the more important aspect of such computational processes is related to the structure of memory : I suggest that for a musician or an artist, it is much more convenient to play with distributed memories, which are not random such as the traditional Random Access Memories (RAM), but which may be accessed according to their content : here is an intrinsic property of neural networks. A second aspect is the ability such systems offer for interacting with learning processes, thru non random memories.
    Such an abitlity is related to the question of pertinence : different approaches may have different aims. For instance, when some engineering perspectives try to minimize errors, an artistic perspective may consider the error as a significant deviation related to a context which may change : reaching a ‘goal’ may be less interesting than encountering ‘errors’. Also, the usual terminology for classifying the different types of artificial neural networks, according to their ‘supervised’ vs. ‘unsupervised’ learning processes, or their ‘associative’ vs. ‘auto-associative’ memories, are not relevant in some aspects : for now, when artificial neural networks interact with human livings, for instance in some artistic context, the learning process is always supervised in some way, they adapt themselves - their memory - according to such a context defining a semiotic loop.

loop for NeuroArts, 2008
© Frédéric Voisin

Hereby, pruning neurons or synapses, adding some amount of noise into synaptic connexions (temperature) is particularly relevant to the dialectical issue of repetition versus variation in music production. Furthermore, some “pathological” behaviors of artificial neural networks may be interesting cases in Arts.
A major question for playing music with artificial neural networks remains : how to represent music to make them learn and play music in a suitable way ?
Encoding music data into vectors representing neural stimuli is very close to elaborate a notation for music and to transcription. It defines how each agent represents the implicite knowledge to learn. This is also a standard question in ethnomusicology as well as in composing music : music notation may reflect not only ‘data’ but, with some pragmatism, some more or less explicit mental representations of music to be actualised. Composing music is also seting up an instrument or orchestra : designing neural architectures, mixing different architectures of neural networks, adapting architectures to the goals and, finally, adapting the musical aims to the neural architectures ; the experimental aspect of this research stands on learning the behavior of different neural architectures, playing with them to get some empirical understanding of learning processes, which requires many hours of rehearsals.
In the very early 2000’s, the PC’s CPU was not fast enougth to play in real-time realistic music scores with artificial neural networks and make them learn at the same time. In 2000, I did a first simple application with a recurent MLP to demonstrate the reliabity of rythmic interpolations with such a neural network with some learning process - and control - in real time (cf. « Pulse3G »). Even if the process was very slow, it was much more interesting than some linear interpolation.
In 2004, Robin Meier and I developed at CIRM (Nice) some new applications of Self-Organizing Maps (SOM) which require less CPU than MLPs to compute learning processes. Then, we was able to play them in real-time with reasonable tempi and began to experiment their control, behavior and musical abilities with some comfort...
A very simple application was the following exemple : a SOM made of 16 neurons is learning 5 vowels from FFT frames and, at the same time, is playing back the neuron which have the maximum activation (thru inverse FFT), while a video is rendering the neural activations of the SOM. This video is particularly important since it permits a visual control of the internal state of the SOM, i.e. the learning process with significant detail, which are more easy to understand than the previous MLP :

  • see video : HAL2004
    With Robin Meier, we later designed neural agent architectures including two SOMs linked with an internal loop, the ouptut of the first SOM being the input of the second SOM, and the output of the last going back to the input of the first, interlaced to the external stimuli (input). This internal loop confers a certain degree of autonomy on the agent, makes it able to have some activity even without any stimuli, according to the agent’s internal state which may be controled at anytime, according to the musical project (see sheme below).
Neural Agent with internal loop

Free Improvisation

Even if such a neural architecture with two SOMs may not be able to learn directly some time sequences - no more than SOMs which are not adapted to this purpose, it provides some interesting behaviors as soon as its control (i.e. stimuli, learning factor, randomness, internal loop, autonomy...) directly participates to some stochastic music style (but anyway, under some conditions, the behavior may be periodic !). Within this condition, it can perform as a soloist instrument (see for instance : « For Alan Turing », by Robin Meier, 2004) and it gain more interest when the musical project makes several agents to interact, the behavior of one being propagated to the others. Hereby, we developed some bands or orchestras of SOM-based agents to play live music with improvisation.
In 2004, our first experiment « Caresses de Marquises », was performed open-air with a small band of neural agents for 12 hours, for the « Nuit Blanche » festival in Paris. Due to CPU limitations, each agent was designed with 60 to 90 neurons ; up to six agents were playing together different ad-hoc synthesisers, each agent communicating to each others under our control. Our role, as conductors of such a neural band, was to dynamically define some arbitrary stimuli which were never eared but only rendered to the audience thru lights, and to adapt the topology of the inter-agent network according to their activity, with some visual control of their activations, the last being rendered to the audience.

SOM-based Multi-Agent Orchestra

Later, in 2006, with « Symphonie des machines » at Sophia Antipolis, we extended the same principle to a big orchestra of approx 100 agents of 300 neurons each, using two linux clusters. The performance also took place open-air, on five sites which were covering a large area and were communicating thru internet.
An important aspect of this performance is that it took advandages of some competences of the different companies which were offering some place for the performance at Sophia Antipolis : ingeniors from Eurecom, France Telecom, Infineon, Philips, INRIA had participate under our supervision to set-up the computing architecture of the orchestra and made it a successfull collective adventure. Rehearsing music for days with such an orchestra made us to learn quite a lot about the behavior of populations of neural agents. Nevertheless, at this time, our own know-how still requires more experimental studies in a musicological perspective to consitute some explicite knowledge.

  • see video : « Symphonie des machines »

    Improvisation With References

    The use of a Recurrent Oscillatory Self-Organizing Maps (ROSOM) make possible a SOM-like system to learn periodic sequences. In 2008, at Palais de Tokyo in Paris, we have built a musical system for « Last Manœuvres in the Dark » (LMD, by Fabien Giraud & Raphaël Siboni) with a population of ROSOM-based agent of approx 100 neurons each to generate original tubes in a hardcore gothic-rock style of ‘live’. The general scheme of this system shows some differences with the previous one, to gain some simplicity, at first to make it running in stand-alone for months, 24 hours a day and, at least, to generate some music which sounds pleasant and easy to identify - i.e. following the pop-rock music rules - for thousands of visitors.

LMD general scheme

At first, we made a master SOM of 3600 neurons to learn a very large corpus of segments of choosen songs and their separated voices, into a format adapted to the ROSOMs. All the segments were done by hand and their encoding adapted to our own knowledge of this style of music. All the system was completely automated by a structured script with some random values for variations. An automatic conductor was controling each ‘live’ session by dynamically quering the classification made by the master SOM to dispatch to the agents some segments of different origins that may match together, as sources for their own learning processes. Thus, for each session of 20’ long, a new musical sequence progressively emerged from the mix of these learning processes. Even if a fixed global musical structure was fixed, beginning with a slow tempo and long periods, ending with a fast tempo and short periods to reach a climax, many variations due to the stochastic controls on the behaviors of the agents made the generated tunes to never be the same.

LMD @ Palais de Tokyo (Superdome), Paris 2008

For more details, see :

  • La musique de LMD
  • « Last Manœuvres in the Dark » (with a video)


    As a conclusion, I may point out the strong relation I have tried to illustrate between Music and Science : music research is an experimental practice since it explores new know-hows with some pragmatic alternatives, when musicology would gain to follow some experimental methods and access to some « exotic » psycho-acoustics by extending its domain to the diversity of culture : the one depends to the other.
    Also, to answer the informal discussion about the function of music we just had yesterday, I would say that music is a field for experimentation as soon as, according to Gilles Deleuze, music is related to a territory. At the corner of the XXth and XXIth centuries, my territory seems to be a mix of computer and neural networks ; finally, as the ethnomusicologist Gilbert Rouget said, since I may “music to survive”, the question of Arts and Science is just a question of style.

  • Read more about my works with neural networks

Voir en ligne : NeuroArts Workshop @ University of Plymouth