Chapter 1: Interactive Music Systems

Interactive computer music systems are those whose behavior changes in response to musical input. Such responsiveness allows these systems to participate in live performances, of both notated and improvised music. This text reviews a wide range of interactive systems, from the perspectives of several different fields. Each field, particularly music theory, artificial intelligence, and cognitive science, has developed techniques appropriate to various facets of interactive music systems. In relating a large number of existing programs to established bodies of work, as well as to each other, a framework for discussing both individual contributions and the field as a whole will be developed.

This book grew out of a doctoral thesis describing my own interactive system, called Cypher. For that reason, extensive consideration is given to the theoretical foundations and practical use of that program. Though the attention devoted to Cypher is out of proportion to that given the other programs reviewed here, the lattice of perspectives through which interactive systems will be viewed is more easily constructed with one example serving to focus the effort. Nonetheless, the book covers many other current applications in considerable detail, particularly when used in conjunction with the companion CD-ROM of audio and program examples. Further, basic concepts and characteristic issues will be illustrated using the Max programming environment, a graphic language for building interactive music systems (Puckette 1991).

1.1 Introduction

The use of computers has expanded musical thought in two far-reaching directions, the first of which concerns the composition of timbre. Digital computers afford the composer or sound designer unprecedented levels of control over the evolution and combination of sonic events. The second expansion stems from the computer’s ability to implement algorithmic methods for generating musical material. As in the case of timbral synthesis, the use of computers for algorithmic composition began with the earliest essays in the field (Koenig 1971). Recently, however, an important extension to this line of development has arisen from the realization of such processes in the context of live performance. By attaining the computation speeds needed to execute compositional algorithms in real time, current computer music systems are able to modify their behavior as a function of input from other performing musicians. Such changes of behavior in response to live input are the hallmark of interactive music systems. Interactivity qualitatively changes the nature of experimentation with compositional algorithms: the effect of different control variable values on the sounding output of the method can be perceived immediately, even as the variables are being manipulated (Chadabe 1989).

In this book, the musical motivations and possibilities accompanying interactive systems will occupy the forefront of the discussion. Second, I examine practical considerations of how to build, analyze, and extend these systems. Finally, I explore perspectives afforded by the viewpoints of artificial intelligence, cognitive science, and music theory in detail, both in terms of their actual contributions to the growth of the field to date, and of their potential impact on the continuing evolution of interactive music systems.

1.2 Machine Musicianship

In using a computer for composition or performance, the most fundamental question to be asked about any particular system is, What musical purpose does it serve? This book will cover several technical areas, some in considerable detail; however, the primary focus will be on the musical opportunities afforded by interaction and the ways in which these opportunities have been explored and elaborated by compositions and improvisations using them.

The responsiveness of interactive systems requires them to make some interpretation of their input. Therefore, the question of what the machine can hear is a central one. Most of the programs reviewed in this book make use of the abstraction provided by the MIDI standard (Loy 1985). An additional level of information can be gleaned from an analysis of the audio signal emitted by acoustic musical instruments. MIDI input and audio signals are low-level, weakly structured representations, which must be processed further by the program to advance any particular musical goals. How these low-level signals are interpreted and structured into higher-level representations is a research topic common to all interactive systems. The representations and processes available for constructing responses form the other broad area of inquiry.

Several of the systems we will review here interpret the input by emulating human musical understanding. Programming a computer to exhibit humanlike musical aptitude, however, is a goal with implications for a much broader range of applications. For almost any computer music program, in fact, some degree of musical understanding could improve the application’s performance and utility. For example, in the automated editing of digital audio recordings, “simultaneous access to the low-level representations of the music in the signal and the higher-level constructs familiar to musicians would allow [automated editors] to perform operations and transformations whose realization by signal processing techniques alone would range from cumbersome to unimaginable” (Chafe, Mont-Reynaud, and Rush 1982, 537). With interactive systems, interpretation of the input is unavoidable. In non-real-time applications, such as sequencers or notation programs, an understanding by the program of concepts such as phrase, meter, direction, and so on could extend their function in the direction of a computer assistant, or interlocutor, able to suggest variations in tempo for the realization of a sequence or to locate points of significant change in a notated composition.

Capturing Musical Concepts

Communication between musicians, verbal as well as musical, assumes certain shared concepts and experiences. Observing, for example, a rehearsal of chamber music, or a piano lesson, one might hear a comment such as, “Broaden the end of the phrase.” Interpreting that instruction engages a complex collection of listening and performing skills, which must be related to each other in a reasonably precise way. The necessary relations are rarely described verbally beyond the use of just such admonitions; if a student were to shape the phrase poorly, a typical next response for the teacher would be simply to play or sing it.

Pursuing such common musical effects in computer music systems often leads to alien and unwieldy constructions, precisely because the software does not share the concepts and experiences that underlie musical discourse. Many of the most persistent problems in computer music (“mechanical” sounding performances, lack of high-level editing tools) come from an algorithmic inability to locate salient structural chunks or describe their function. Although research has begun to show us systematic ways in which human performers add expression to their rendering of a score (Palmer 1988), a general application of the fruits of this research will be impossible until programs can find the appropriate structural units across which to apply expressive deformations. In other words, it does computer music systems little good to know how human players broaden phrase boundaries if those systems cannot find the phrases in the first place. Among the concepts the machine would have to employ to “broaden the end of the phrase” are beat, harmonic progression, meter, and decelerando (Figure 1.1). These concepts rely in turn, I maintain, on even more primitive perceptual features such as loudness, register, density, and articulation, and the way these change in time.

In their interpretation of musical input, interactive systems implement some collection of concepts, often related to the structures musicians commonly assume. Each interactive system also includes methods for constructing responses, to be generated when particular input constructs are found. As methods of interpretation approach the successful representation of human musical concepts, and as response algorithms move toward an emulation of human performance practices, programs come increasingly close to making sense of and accomplishing an instruction such as “broaden the end of the phrase.”

Before embarking on the formidable task of trying to achieve such behavior, one could ask why it is important for a computer program to approach human performance practices. In fact, throughout the early stages of electronic and computer music development, an expressed goal was often the elimination of human performers, with all their limitations and variability. “About 1920, when the slogan ‘objective music’ was in vogue, some famous composers (Stravinsky, for instance) wrote compositions specifically for pianola, and they took advantage of all the possibilities offered by the absence of restraints that are an outcome of the structure of the human hand. The intent, however, was not to achieve superior performance but to restrict to an absolute minimum the intervention of the performer’s personality” (Bartok 1937, 291).

Figure 1.1

Computer music has, in fact, provided a perfect vehicle for eliminating the performer’s personality. Compositions realized on tape can be painstakingly constructed by the composer, who in effect “performs” the work while entrusting it to a fixed realization, which is then played back without any further human intervention. Many of the most compelling and durable compositions in the field have been made in exactly this way. Eliminating performers entirely is hardly a desirable outcome, however, and one that few if any composers in the field would advocate. Their elimination is undesirable, beyond the purely social considerations, because human players understand what music is and how it works and can communicate that understanding to an audience, whereas computer performers as yet do not.

Works for performers and tape have been an expression of the desire to include human musicianship in computer music compositions. Coordination between the fixed realization of the tape and the variable, expressive performance of the human players, however, can become problematic. Such difficulties are more pronounced when improvisation becomes part of the discourse. And, as taped and performed realizations are juxtaposed, the disparity between levels of musicality evinced by the two often becomes untenable.

Composition by Refinement

Interactive music systems contribute to a process of composition by refinement. Because the program reacts immediately to changes in configuration and input, a user can develop compositional applications by continually refining initial ideas and sketches, up to the development of complete scripts for a performance situation in which the computer can follow the evolution and articulation of musical ideas and contribute to these as they unfold.

Further, many interactive systems can be considered applied music theories. Music theory, in its best form, is the scholarly attempt to describe the process of composing, or listening to music. Computer systems able to implement this work in real time allow the musician to assess the validity of the intellectual enterprise by hearing it function in live musical contexts. The construction of formal processes is judged by the ear and sound, not through more words and paper. Moreover, implementation in a computer program demands the formalization of a theory to the point where a series of machine instructions can realize it. For suitable theories, the added rigor of realization by computer can clarify their formulation and make them available in a form from which they can be extended or used in other computational tasks. When a theory has been brought to the point of interactivity, it can be applied to the production and analysis of music in its native environment – that is, performed and experienced as live music has always been.

1.3 Classification of Interactive Systems

A primary objective of this book is to provide a framework within which interactive systems may be discussed and evaluated. Many of the programs developed to date have been realized in relative isolation from one another, with little scope for building on the work of earlier efforts. Now, several fundamental tools of the trade have become standardized and are no longer so subject to ad hoc solutions. Here I will propose a rough classification system for interactive music systems. The motivation for building such a set of classifications is not simply to attach labels to programs but to recognize similarities between them and to be able to identify the relations between new systems and their predecessors.

This classification system will be built on a combination of three dimensions, whose attributes help identify the musical motivations behind types of input interpretation, and methods of response. The dimensions will be described using some points along the continuum of possibilities for that dimension, points generally close to the extremes. The points used should not be considered distinct classes, however. Any particular system may show some combination of the attributes outlined here; however, these metrics do seem to be useful in identifying characteristics that can often distinguish and draw relations between interactive programs.

The first dimension distinguishes score-driven systems from those that are performance-driven.

Score-driven program use predetermined event collections, or stored music fragments, to match against music arriving at the input. They are likely to organize events using the traditional categories of beat, meter, and tempo. Such categories allow the composer to preserve and employ familiar ways of thinking about temporal flow, such as specifying some events to occur on the downbeat of the next measure or at the end of every fourth bar.
Performance-driven programs do not anticipate the realization of any particular score. In other words, they do not have a stored representation of the music they expect to find at the input. Further, performance-driven programs tend not to employ traditional metric categories but often use more general parameters, involving perceptual measures such as density and regularity, to describe the temporal behavior of music coming in.

Another distinction groups response methods as being transformative, generative, or sequenced.

Transformative methods take some existing musical material and apply transformations to it to produce variants. According to the technique, these variants may or may not be recognizably related to the original. For transformative algorithms, the source material is complete musical input. This material need not be stored, however – often such transformations are applied to live input as it arrives.
For generative algorithms, on the other hand, what source material there is will be elementary or fragmentary – for example, stored scales or duration sets. Generative methods use sets of rules to produce complete musical output from the stored fundamental material, taking pitch structures from basic scalar patterns according to random distributions, for instance, or applying serial procedures to sets of allowed duration values.
Sequenced techniques use prerecorded music fragments in response to some real-time input. Some aspects of these fragments may be varied in performance, such as the tempo of playback, dynamic shape, slight rhythmic variations, etc.

Finally, we can distinguish between the instrument and player paradigms.

Instrument paradigm systems are concerned with constructing an extended musical instrument: performance gestures from a human player are analyzed by the computer and guide an elaborated output exceeding normal instrumental response. Imagining such a system being played by a single performer, the musical result would be thought of as a solo.
Systems following a player paradigm try to construct an artificial player, a musical presence with a personality and behavior of its own, though it may vary in the degree to which it follows the lead of a human partner. A player paradigm system played by a single human would produce an output more like a duet.

For the moment, a brief example will clarify how the dimensions are used. Score followers are a group of programs able to accompany a human instrumental soloist by matching her realization of a particular score against a stored representation of that score, simultaneously performing a stored accompanimental part. Such applications are a perfect example of score-driven systems. The response technique is sequenced, since everything the machine plays has been stored in advance. Finally, score followers can be regarded as player paradigm systems, because they realize a recognizably separate musical voice, assuming the traditional role of accompanist in the performance of instrumental sonatas.