This chapter reviews a broad spectrum of existing interactive systems, elaborating and refining a framework for discussion of these programs in the process. Many of the fundamental issues surrounding interactive systems will be sharpened as they are considered in connection with working examples. Further, a range of compositional concerns and motivations come to light through a careful consideration of the music actually made.

3.1 Cypher

Cypher is an interactive computer music system that I wrote for composition and performance. The program has two main components: a listener and a player. The listener (or analysis section) characterizes performances represented by streams of MIDI data, which could be coming from a human performer, another computer program, or even Cypher itself. The player (or composition section) generates and plays musical material. There are no stored scores associated with the program; the listener analyzes, groups, and classifies input events as they arrive in real time without matching them against any preregistered representation of any particular piece of music. The player uses various algorithmic styles to produce a musical response. Some of these styles may use small sequenced fragments, but there is never any playback of complete musical sections that have been stored in advance for retrieval in performance.

In the course of putting together a piece of music, most composers will move through various stages of work — from sketches and fragments, to combinations of these, to scores that may range from carefully notated performance instructions to more indeterminate descriptions of desired musical behavior. Particularly in the case of pieces that include improvisation, the work of forming the composition does not end with the production of a score. Sensitivity to the direction and balance of the music will lead performers to make critical decisions about shaping the piece as it is being played. Cypher is a software tool that can contribute to each of these various stages of the compositional and performance process. In particular, it supports a style of work in which a composer can try out general ideas in a studio setting, and continually refine these toward a more precise specification suitable for use in stage performance. Such a specification need not be a completely notated score: decisions can be deferred until performance, to the point of using the program for improvisation. The most fundamental design criterion is that at any time one should be able to play music to the system and have it do something reasonable; in other words, the program should always be able to generate a plausible complement to what it is hearing.

Connections between Listener and Player

Let us sketch the architecture of the program: The listener is set to track events arriving on some MIDI channel. Several perceptual features of each event are analyzed and classified. These features are density, speed, loudness, register, duration, and harmony. On this lowest level of analysis, the program asserts that all input events occupy one point in a featurespace of possible classifications. The dimensions of this conceptual space correspond to the features extracted: one point in the featurespace, for example, would be occupied by high, fast, loud, staccato, C major chords. Higher levels look at the behavior of these features over time. The listener continually analyzes incoming data, and sends messages to the player describing what it hears.

The user’s task in working with Cypher is to configure the ways in which the player will respond to those messages. The player has a number of methods of response, such as playing sequences or initiating compositional algorithms. The most commonly used method generates output by applying transformations to the input events. These transformations are usually small, simple operations such as acceleration, inversion, delay, or transposition. Although any single operation produces a clearly and simply related change in the input, combinations of them result in more complicated musical consequences.

Specific series of operations are performed on any given input event according to connections made by the user between features and transformations. Establishing such a connection indicates that whenever a feature is noticed in an event, the transformation to which it is connected should be applied to the event, and the result of the operation sent to the synthesizers. Similarly, connections can be made on higher levels between listener reports of phrase-length behavior, and player methods that affect the generation of material through groups of several events.

The preceding description provides a rough sketch of the way Cypher works. Much more detail will be provided in later chapters: from the theoretical underpinnings in chapter 4 through an exposition of the analytical engine in chapter 5, to a review of composition methods in chapter 6. Already, however, we can relate the program to the descriptive framework for interactive systems.

Considered along the first dimension, Cypher is clearly a performance-driven system. The program does not rely on stored representations of musical scores to guide its interaction with human performers. A cue sheet of score orientation points is used in performances of notated compositions, but this level of interaction is distinct from the more rigorous following techniques of score-driven programs. Further, the facilities for playing back stored scores in response are quite attenuated in Cypher. Although stored fragments can be played back in a limited way, I have never used any sequences in compositions using Cypher.

The indistinct nature of the classification metrics I propose can be clearly seen by the position of Cypher with respect to the second dimension: the program uses all three techniques, though with varying degrees of emphasis, to produce responses. Of these three classes, Cypher performs transformations most often, algorithmic generation somewhat, and sequencing hardly at all.

Finally, Cypher is a player paradigm system. Performing with the program is like playing a duet or improvising with another performer. The program has a distinctive style and a voice quite recognizably different from the music presented at its input. The player/instrument paradigm dimension is again a continuum, not a set of two mutually exclusive extremes. We can place Cypher at the player paradigm end of the scale, however, because a musician performing with it will experience the output of the program as a complement, rather than as an extension or elaboration, of her playing.

Tutorial 1 — Input and Output

In the following collection of four tutorials, the operation of Cypher from its graphic interface will be explained. Cypher itself can be downloaded from the companion CD-ROM; these tutorials will make the most sense if they are followed while working with the program. First, we need to establish the program’s channels of communication to the outside world. The upper right section of the interface, shown in figure 3.1, is used to establish output channels and program changes for the responses of Cypher.


Figure 3.1


The Bank and Timbre radio buttons together determine the sounds used to play the output of the composition section. Following normal Macintosh interface practice, exactly one radio button can be active for both functions – there will be one bank and one timbre chosen at any given time. The meaning of these buttons is established in a text file, called cypher.voices, which is read by the program on startup. This allows users to change the effect of the Bank and Timbre buttons simply by editing cypher.voices to match any local configuration of synthesis gear.

10                             ; number of output channels
1 2 3 4 5 6 7 8 10 11          ; output channels to use
16 11  8 29  3  9  1  3  1 21  ; bank0 programs
18 11 17  2 28 13 30  2 12  2  ; bank1 programs
24  8 15 26 16 30  6 12  4 20  ; bank2 programs
 8  1 12 19  1 24  9 14  6 16  ; bank3 programs
 1  5 26 32 30 19 14 16  7 18  ; bank4 programs
 7  9 12  1 31  9 30 11 14  3  ; bank5 programs
 7  3  8  1 16  7  8 18  5 19  ; bank6 programs
23  4 17 14 31  9  9 24 17 15  ; bank7 programs

3 2 3 4 4 3 4 5                ; # of channels per timbre

3  8 10                        ; output channels of timbre 0
8 10                           ; output channels of timbre 1
2  6 11                        ; output channels of timbre 2
1  3  4 11                     ; output channels of timbre 3
1  2  3 10                     ; output channels of timbre 4
4 10 11                        ; output channels of timbre 5
5  6 10 11                     ; output channels of timbre 6
1  5  6  7 10                  ; output channels of timbre 7

1 2                            ; input channels

Figure 3.2

A listing of a typical cypher.voices file is shown in figure 3.2. The first specification recorded, as we see, is the number of MIDI channels available for output. In this configuration, there are ten MIDI output channels. Next, the file indicates which channel numbers are associated with the ten outputs. Then a series of program changes are listed. Here, the meaning of the Bank buttons is established. When it is selected, each Bank button sends out a set of program changes, one for each output channel. Which program changes are sent is specified as shown in the listing; there are eight sets of program changes, one for each button on the interface. Using this configuration file, we see that when a user clicks on Bank button 0, MIDI channels 1, 2, 3, 4, 5, 6, 7, 8, 10, and 11 will receive the program change messages 16, 11, 8, 29, 3, 9, 1, 3, 1, and 21 respectively.

The Timbre buttons choose subsets of the ten possible channels, which will all receive copies of the MIDI Note On and Note Off messages coming from the composition section. The entries in cypher.voices shown in figure 3.2 after the specification of the Bank buttons deal with the meaning of the Timbre buttons. First, the number of channels across which output will be sent is listed, one entry for each Timbre button. Thus, following this specification, Timbre button 0 will instruct the program to send out on three channels, Timbre button 1 on only two channels, and so on. Next, the configuration file lists precisely which output channels those are. So, for Timbre button 0, we know that it will limit output to three channels. In the line followed by the comment “output channels of timbre 0” we see that those three will be channels 3, 8, and 10.

Finally, two entries indicate which input channels will be monitored by the program when either the Kbd or Ivl selections are chosen from the interface. Kbd and Ivl refer to the two most commonly used source of MIDI input signals: either some kind of MIDI keyboard controller; or an IVL pitchrider, a commercially available device that converts audio signals to streams of MIDI data. On the interface, there is an entry marked Channel: followed by either Kbd, Ivl, or Nobody. By clicking on the current selection, users will cycle through the three possibilities. When the Channel: selection reads Kbd, Cypher will listen to events arriving from the channel listed as the next-to-last entry in cypher.voices (channel 1 in figure 3.2). When Channel: selects Ivl, Cypher listens to MIDI coming from the channel listed in the last cypher.voices entry (channel 2 in figure 3.2), and when Channel: selects Nobody, Cypher will not respond to MIDI events arriving from any channel.

Tutorial 2 — Making Connections

Cypher is equipped with a graphic interface to allow the connection of features to transformation modules, the selection of timbres, and the preservation and restoration of states. A reproduction of the control panel is shown in figure 3.3. The ovals occupying the lower third of the interface are used to configure connections between analysis messages emanating from the listener and composition methods executed in response. The upper two rows of ovals correspond to level 1 of both the listener and player; the lower ovals are used to configure level 2. We can see the various featural classifications indicated under the highest row of ovals; from left to right they are marked all (connecting a composition method to this will cause the method to be invoked for all listener messages); line (density classifications of single-note activity will cause a trigger to be sent out from this oval); oct1 (density classifications of chords contained within an octave will send a trigger from here) and so on. The scroll bar underneath the pair of level-1 ovals can be used to bring the rest of the messages and methods into view. Shown in figure 3.4 are some of the additional ovals available on level 1.

Figure 3.3

Figure 3.3

When a listener message and player method are connected, that method will be called every time the listener produces the corresponding classification. To make a connection, a user simply draws a line from a message to a method, or vice versa. In the following example, I will assume that the reader has access to the companion CD-ROM and has installed Cypher on the hard disk of a Macintosh/MIDI computer system. Once the program is running, click and hold down the mouse button inside the listener oval marked line. Then, while continuing to hold down the button, draw a line to the player method marked invt. Inside the method oval, release the mouse button. A line should continue to appear between the two ends of the connection.

Figure 3.4

Figure 3.4

Now, whenever single notes (as opposed to chords) are played to the program, Cypher will respond with a pitch which is the inversion of the pitch played, around an axis of symmetry located on the E above middle C (MIDI note number 64). For example, if the performer plays the F above middle C (MIDI note number 65), Cypher will play Eb (MIDI 63). The performer plays G (MIDI 67), Cypher plays Db (61), and so on. Direction is reversed – as the performer plays upward, Cypher goes down; the performer goes down, Cypher, up. Playing a chord will not provoke a response, however: Cypher is silent. This is because there are no composition methods associated with the density classifications for chords.

Figure 3.5

Figure 3.5

Let us make a second connection, between the listener message oct1 and the player method arpg. Oct1 indicates that a chord has been heard, with all of its constituent pitches contained within a single octave. The arpg transformation method will take all of the pitches given to it in a chord and play them back out as an arpeggio, that is to say, one after another. Now, Cypher has two different responses, which change according to the nature of the music being played by the performer. Single notes will still trigger the inversion behavior, and chords within an octave will be arpeggiated. Chords wider than an octave will still provoke no response, because again there is no associated player method.

In figure 3.6, the listener ovals at one end of the connection have gone off the left of the screen. When the user draws a connection to ovals off either end of the interface, the ovals showing will scroll automatically to bring others into view, as the user continues to draw either left or right. The end of the connection is made normally, and the screen will be left with the origin of the connection somewhere off the end of the display, as in figure 3.6.

Figure 3.6

Figure 3.6

Now draw a third connection between the listener message oval marked loud, and the transformation method tril. The first two connections we made concerned the same feature (density). Either one or the other was active. The third connection is based on a different feature, loudness. This can be present or absent at the same time as either of the other two. The connection indicates that whenever the performer plays loudly, the trill transformation will be invoked. Now, quiet single notes will provoke the same response as before: they are inverted around the middle of the keyboard. Loud single notes, however, will be inverted, and have a trill attached to them. Similarly, quiet chords within an octave will be arpeggiated. Loud chords will be arpeggiated, and each note of the arpeggio will be trilled. Even chords wider than an octave now will be played back with a trill, if they are performed loudly. Only wide chords played quietly have no response at all.

Figure 3.7

Figure 3.7

Any feature/filter connection may be broken again by clicking once in an oval at either end of the line. In fact, clicking on an oval will disconnect all the lines arriving at that oval, so that if several transformations are all connected to a single feature, clicking on that feature will disconnect all the transformations. Clicking on the transformation end of the line of that connection, however, will disable only the link between the feature and that single transformation. Any other links to the same transformation will likewise be severed. The Presets check boxes are simply collections of connections; they are toggles, which alternate between making and breaking certain sets of connections. As many as desired can be turned on at any one time.

A feature may be connected to any number of transformation modules; similarly, a transformation module may be connected to any number of features. If a configuration indicates that several methods are to be invoked, the transformations are applied in series, with the output of one being fed through to the input of the next. This is why, when a loud chord is played in the example configuration, a series of trills results. The first transformation applied is arpg, which splits the chord into separate notes. Then, this result is passed to tril, adding trills to each one of the individual chord members.

Because it is invoked in series, the same collection of transformations will produce different outputs depending on the order of their execution. The priorities of the transformations are reflected and established by their appearance on the interface. When several modules are to be applied to an input event, they are invoked from right to left as they appear on the screen. To reorder the priorities, a module can be “picked up” with the mouse and moved right or left, giving it a lower or higher priority relative to the other transformations. This new ordering of transformation priorities will be stored in the program’s long-term memory, so that the next invocation of the program will still reflect the most recent ordering.

Tutorial 3 — Level-2 Connections

The third and fourth rows of ovals on the interface represent the listener and player on the second level. The listener ovals (again, those higher on the panel) refer to the regularity and irregularity of the first-level features: irrg, for instance, stands for irregular register, meaning that the registers of incoming events are changing within phrases. The complementary analysis is indicated by rgrg, which stands for regular register, and means that the events in the current phrase are staying in a constant register more often than not (Figure 3.8). From left to right, the features that are tracked for regularity are density (ln), register (rg), speed (sp), dynamic (dy), duration (dr), and pitch (pi). The classification of regularity and irregularity is explained in chapter 5.

Figure 3.8

Figure 3.8

There are some other level-2 listener messages than the ones dealing with regularity and irregularity. As on level 1, the first oval is marked all, which will invoke a connected player method on every incoming event. The second oval is phrs, which sends out a message on every event that Cypher takes to be the beginning of a new phrase. After the regularity/irregularity pairs, twelve key roots are listed: the oval marked C, for example, will send out a message for every event when the listener believes the current key harmonically is C major or minor.

The lowest row of ovals represents the level-2 player methods. Connections between listener messages and player methods are accomplished in the same way as for level 1. Let us make a connection between the listener oval marked phrs, and the player oval with the same marking (Figure 3.9).

Figure 3.9

Figure 3.9

The phrs player method makes a thumping sound. The phrs listener message is sent out whenever Cypher thinks a new phrase has begun. Therefore, by connecting the two, we will hear the thump at all of the phrase boundaries identified by the program. You will notice that while Cypher’s choices are not entirely accurate, neither are they entirely arbitrary. Again, chapter 5 covers in some depth how phrase boundaries are located.

Now we will add two more connections: from rgln to vryd and from irln to nodn (Figure 3.10). With these connections we are calling up player methods from the regularity/irregularity reports concerning density. Rgln is an abbreviation of “regular line” (using “line” to indicate vertical density). Irln, then, is an abbreviation of “irregular line.”

Figure 3.10

Figure 3.10

The regularity message is connected to the VaryDensity method: this process looks to see whether lines or chords are currently being played and instructs the player to do the opposite. That is, if the performer is playing lines, these will be transformed into chords, and chords will be changed into lines through arpeggiation. The complementary message, indicating irregular density behavior, is connected to the NoDensity method. This disconnects any of the methods used by VaryDensity, negating that operation. The regularity classification is done across the current phrase. Assuming the phrs/phrs connection is still active, at each new phrase boundary Cypher will play a thump. Then depending on whether predominantly chords or lines are played (regular behavior) or a mixture of the two (irregular behavior), the program will either play with the opposite density or not respond at all.

Tutorial 4 — Menus

The File menu provides a number of commands concerning the state save-and-restore mechanism, and the execution of Cypher itself. The first three menu items have to do with opening and closing state files. A state captures all of the current program settings; saving a state means storing all of the connections, timbre and bank settings, and other control variables active at the time of the save. Restoring a state returns the program to some given collection of settings.

The state file menu items follow the layout of the usual Macintosh application File menu: New, Open, and Close. A state file contains many individual state records. With one file open, a user can save up to 100 different records. When a user opens a different file, the state records contained in it (again up to 100) become available. The New command opens a new state file, which will then be ready to receive and retrieve state records. Open is similar, but will present a dialog box with the names of all currently available state files, from which the user may choose. Finally, Close will close any currently open state file, clearing the way for a different one to be used (only one state file may be open at a time).

The next set of entries saves and restores individual states from the file collections opened with the previous section. Save will save the current program settings to a file, and Restore will restore a state from an open file. The record number for the Save and Restore commands is indicated with the state slider. If the slider is positioned on zero, a Save command will put the current program settings in an open file at record zero. The Restore command would retrieve the settings from record zero. The state slider must be moved to a new position before executing either the Save or Restore commands, which will then use the new record position in the file to perform their operation. The Dump command instructs the program to save each state, when passing through states with the score-orientation process. When Dump is turned on, a complete state file for a given composition will be saved automatically. The final File menu entry is Quit, which exits the program.

The second available menu is the Alg (for Algorithm) menu. The entries each invoke one of the compositional algorithms described in chapter 6. The performance parameters for these invocations are compiled into the program; relatively inflexible, this menu is primarily provided as a means of testing and demonstrating the basic texture implemented by each algorithm.

The remaining menu is Rehearse. The Single and Free commands will be covered in the section on score orientation. Hung is a MIDI panic button, used to terminate hung notes. Anyone who has used MIDI will be familiar with the phenomenon of the stuck note, which occurs when a Note On command somehow gets sent out without the corresponding Note Off command. When that happens, the sound simply goes on interminably. The Hung menu item will silence any hung notes, by the rather direct device of sending out a Note Off command to every note number on every channel. The rudely named command ShutUp will also terminate any stuck notes, with the further effect of preventing the program from producing any more sound. Sound will again issue from the program when the Clear button is clicked; Clear will also disconnect all level-1 and level-2 listener and player agents.

The text controller Echo cycles through the possibilities On and Off. When Echo is on, the MIDI data arriving from the device indicated by the channel input controller will be copied and sent out to the synthesizers selected by the Bank and Timbre buttons. This provides a convenient way to work on the cypher.voices file, trying out different voice combinations by directly playing them from the keyboard, for instance, rather than hearing them only as an expression of program output.

3.2 Score Following and Score Orientation

Now that we have seen how Cypher is used, we will review music that has been made with it. Before doing so, let us set the stage for that review, and the examination of other interactive systems following it, by looking at ways to coordinate machine and human performances. Two divergent means to the same end are considered: score following allows a tight synchronization between a solo part and a computer accompaniment; score orientation refers to a range of applications that coordinate human and computer players by lining them up at certain cue points in a composition.

Score Followers

The first dimension in our interactive system classification scheme distinguishes between score-driven and performance-driven programs. The paradigmatic case of score-driven systems is the class of applications known as score followers. These are programs able to track the performance of a human player and extract the tempo of that performance on a moment-to-moment basis. The best theory of the current tempo is then used to drive a realization of an accompaniment (Vercoe1984; Dannenberg 1989).

A pattern-matching process directs the search of a space of tempo theories. The derived tempo is used to schedule events for an accompanying part, played by the computer, whose rhythmic presentation follows the live performer – adapting its tempo to that of its human partner. A comparison of the time offsets between real events and the time offsets between stored events (to which the matching process has decided that the real events correspond), yields an estimate of the tempo of the real performance and allows the machine performer to adjust its speed of scheduling the accompaniment accordingly.

The classic example of score following uses a computer to accompany a human soloist in the performance of a traditional instrumental sonata. As the human soloist plays, the computer matches that rendition against its representation of the solo part, derives a tempo, and plays the appropriate accompaniment. The human can speed up or slow down, scramble the order of notes, play wrong notes, or even skip notes entirely without having the computer accompanist get lost or play at the wrong speed. Such applications are unique in that they anticipate what a human player will do: the derivation of tempo is in fact a prediction of the future spacing of events in a human performance based on an analysis of the immediate past.

Score followers are score-driven, sequenced, player-paradigm systems. The pattern matching must be reliably fault tolerant in the face of wrong notes, skewed tempi, or otherwise “imperfect” performances rendered by the human player. Barry Vercoe’s program is able to optimize response to such performance deviations by remembering the interpretation of a particular player from rehearsal. Successive iterations of rehearsal between machine and human results in a stored soloist’s score which matches more precisely a particular player’s rendering of the piece (Vercoe 1984).

The application described in (Dannenberg and Mont-Reynaud 1987) is able to perform real-time beat tracking on music that is not represented internally by the program. The beat-tracking part develops a theory of a probable eighth-note pulse, and modifies the tempo of this pulse as additional timing information arrives. At the same time, the harmonic section of the program navigates its way through a chord progression by matching incoming pitches from a monophonic solo against sets of the most probable pitches for each successive chord. Here, the principles of score following are expanded to implement a similar functionality – performing an accompanimental part against an improvisation. Some things about the improvisation are known in advance — for example, the chord progression — however, the precise sequence of notes to be played, and their timing relations, are not known. Here we can see the essence of score following, comparing a performance against a stored representation of some expected features of that performance, carried out on a level other than note-to-note matching. Such a program moves over toward the performance-driven end of the scale and demonstrates one potential application area for these techniques that goes beyond the classic case.

The musical implications of score following in its simplest form are modest: they are essentially tempo-sensitive music-minus-one machines, where the computer is able to follow the human performer rather than the other way around. In most compositional settings, however, score following is used to coordinate a number of other techniques rather than to derive the tempo of a fixed accompaniment. In many of the compositions recently produced at IRCAM, for example, the computer part realizes a sophisticated network of real-time signal processing and algorithmic composition, guided throughout the performance by a basic score-following synchronization system. In his paper “Amplifying Musical Nuance,” Miller Puckette describes this technique of controlling hidden parameters as a human player advances through the performance of a score (Puckette 1990). The hidden parameters are adjusted, and sets of global actions executed, as each successive note of the stored score is matched.

The notes, and more especially the global actions, need not merely select synthesizer parameters. In practice, much use is also made in this context of live “processes” which may generate a stream of notes or controls over time; global actions can be used to start, stop, and/or parameterize them. Notes may themselves be used to instantiate processes, as an alternative to sending them straight to the synthesizer. If a “process” involves time delays (such as in the timing of execution of a sequence), this timing may be predetermined or may be made dependent on a measure of tempo derived from the live input. (Puckette 1990, 3)

The following section describes some of the basic tools in Max used to accomplish the kinds of score synchronization previously described.

Follow, Explode, and Qlist

In Max, several objects support score following, including follow, explode, and qlist. The follow object works like a sequencer, with the added functionality that a follow message will cause the object to match incoming Note On messages against the stored sequence. Matches between the stored and incoming pitches will cause follow to output the index number of the matched pitch. Explode is similar to follow, with a graphic editor for manipulating events. Qlist is a generalized method for saving and sending out lists of values at controlled times.

In the patch in figure 3.11, the outlets of notein are connected to stripnote, yielding only the pitch numbers of Note On messages. These are sent to a makenote/noteout pair, allowing audition of the performance (albeit a rather clumsy one, since all the velocities and durations are made equal by the arguments to makenote). Further, the incoming pitch numbers are sent to a follow object, to which is connected a number of message boxes, lined up in the upper right corner of the patch. Clicking on the record message will cause follow to start recording the pitches arriving from stripnote. Then, clicking on start will make follow play the pitches back. The note numbers are sent out follow’s right outlet, where we can see them in the number box and hear them through makenote/noteout. Clicking stop will halt either recording or playback. Finally, once a sequence of note numbers is stored in the follow object, clicking on follow 0 will start the object looking for the pitches of the sequence to be played again from the beginning. Each match between the stored sequence and new performance causes the index number of the matched pitch in the sequence to be sent out the left outlet; we can track the progress of the match with the number box attached to that outlet. Users can experiment with the leniency of the matching algorithm: playing some extra notes will not affect the match of the sequence when it is resumed. Skipping a note or two will cause the matcher to jump ahead as well. More radical deviations from the stored version, however, will eventually make the machine lose its place.

Figure 3.11

Figure 3.11

Explode is a graphic score editor built on top of the basic functionality of follow. Figure 3.12 shows an explode window: the black dashes in the main grid represent MIDI note events, where the x axis represents time, and the y axis note number. The chart above the grid makes it possible to change the quantities being represented by the y axis (y), the width of the dashes (w), or the numerical legends shown above each dash (#). The way the graph in figure 3.12 is set up, y shows pitch, w represents the duration of each event, and # shows the associated channel number. Further, the elongated rectangles next to the five parameter names (time, pitch, velo, length, chan) can be used to edit events directly. Whenever a single event, or group of events (selected with the shift key down), is selected, typing numbers in these parameter boxes will set all selected events to have the entered parameter value.

Figure 3.12

Figure 3.12

With the three icons in the upper left corner, the user can choose tools to move existing events (the arrow), stretch events in time (the horizontal arrow), or draw completely new ones (the pencil). Once a sequence has been specified, it can enter into the same relations with other messages and objects as follow: that is, it can play back the material, or follow another MIDI stream as it performs the sequence, sending out the index numbers of matched events.

A qlist is an object that manages lists of values, sending them out in response to messages. Lists in a qlist are separated by semicolons, which distinguish them one from another. The lists themselves can be either simple lists of numbers or lists with a designated receiver. In figure 3.13, the window labeled Untitled was obtained by double-clicking on the qlist object. Within this window, lists of numbers are indicated, separated by semicolons. When the qlist receives a rewind message, it will reset its pointer to the beginning of the list. Next messages then will step through the lists, sending the stored values either through the outlet of the qlist object (for unlabeled lists) or to the receiver of a named variable. Successive next messages sent to the qlist in figure 3.13, then, will set the first three number boxes to read 1, 2, 3, and the number boxes below the variable receiver to read 4, 5, 6. All number boxes will then be reset to zero, as the qlist continues.

Figure 3.13

Figure 3.13

Explode and qlist can be used together as a general framework for score following. The score to be matched can be entered and tracked using the explode object. Event markers are attached to the explode score, to signal when the player arrives at points requiring a response. When a marker is found, a next message is sent to the qlist, which will forward the next record of values and messages to objects connected to its outlet or mentioned as receivers. See the score_following.demo prepared by Cort Lippe, on the companion CD-ROM, for an example of these objects working together.

Score Orientation

Score following is a powerful technique for tracking completely stored scores. Several programs have implemented related mechanisms for finding cues at specified points in a composition: rather than matching every incoming event against a score, these systems are concerned with synchronizing human and machine performances on a much coarser scale. Because such mechanisms are designed not to follow note-by-note performance but to find certain landmarks, I refer to them generically under the rubric of score orientation.

During a musical performance, an interactive program must be able to advance from one type of behavior to another. One way to advance it is through user manipulation of some interface, be it an alphanumeric keyboard, foot pedals, or graphic control panel. Score following can similarly drive a sequence of program states, for example when used in conjunction with a qlist. Score orientation is essentially the same idea. The goal of the technique is to advance the computer program from one state to another as the performance of some composition progresses. In contrast to score following, however, score orientation only examines the incoming performance at certain times, looking for some cue that marks the point at which the next program configuration should be introduced.

An important part of score-following applications is that they maintain a representation of the expected performance such that it can be efficiently be matched in real time. Another critical component is the pattern matcher itself, which must accept performances containing unforeseeable deviations from the stored version and be able to find the correct point in the score after a period of instability in the match. Score orientation, therefore, differs from score following in those two critical areas. In score orientation, the program is following the human performance to find those points in the composition where it is to advance from one state to the next. The technique does not follow a complete representation of the human player’s part but is only scanning its input for cues at those moments when it expects to perform a state change. Further, score orientation can be accomplished with much less sophisticated pattern matching. Cues can usually be found that will be unambiguous enough inside a given time frame for the orienter simply to search for that single event.

Daniel Oppenheim’s Dmix system implements score facilities that straddle the following/orientation distinction outlined here. He notes a difference between horizontal and vertical tracking, in which horizontal tracking corresponds to the classic score-following case. Vertical tracking allows a more spontaneous connection between a live performer and computer response, in which improvisations or interpretations by a performer would control variables of ongoing compositional algorithms, affecting a much broader range of parameters within the musical context than tempo alone. Dmix is an object-oriented, integrated system for composition and performance; within it, the SHADOW mechanism can treat collections of events as scores for either horizontal or vertical tracking. Moreover, such scores can be treated by other parts of Dmix (an extension of Smalltalk-80) for simulating a performance, editing, or transfer to disk (Oppenheim 1991).


Cypher’s score-orientation process uses a technique called windowing. A window is a designated span of time. During the duration identified by a window, the program is searching for a particular configuration of events; when that configuration is found, the program state is updated and, if necessary, the opening of another window is scheduled. If the desired configuration has not been found by the end of a window, the state change and scheduling of the next window are done anyway. This ensures that the program will never remain in a given state indefinitely waiting for any single cue.

Figure 3.14

Figure 3.14

Cues can be missed for a variety of reasons; in practice, one missed target will not throw off the orientation. Usually, the next window successfully finds a match near its leading edge. If two or three windows in a row pass without a match, however, the orienter is irretrievably lost, and must be righted through external intervention. But even if nothing were ever played to the score orienter, the program would eventually make it to the end of the piece – it would just no longer be synchronized with an external performer. The graphic interface provides mechanisms for beginning execution at any given state and for recovering from missed cues, if this should be necessary in performance.

There are six parameters associated with a score-orientation window: (1) the time offset between the identification of the preceding orientation cue and the expected arrival of the next one; (2) the leading edge of the window, or time in advance of the expected arrival of the target event, at which to begin looking for it; (3) the trailing edge of the window, the duration after the expected arrival of the target event during which to continue looking for it; (4) the type of event being targeted (pitch, attack, pattern, etc.); (5) the specific event of that type to be located (pitch number, attack point, pattern description, etc.); and (6) the routine to be invoked when a target has been found. All of these parameters are coded by the user into a cue sheet, which the score orientation process will use to schedule and execute polling for the designated events during performance.

The following is a typical entry for the cue sheet of a Cypher composition.

{ 7080L, 2500L, 4000L, PITCH, 59 }. There are five fields in the structure, which correspond to the expected arrival time of the cue (7080 milliseconds after the previous cue); the time in advance of the expected arrival to open the search window (2500 milliseconds); the time after the expected arrival to continue the search (4000 milliseconds); the type of cue to look for (PITCH); and which pitch to notice (59, or the B below middle C). With these settings, the duration of the search window is 6.5 seconds, with the arrival of the cue expected close to the beginning of the window.

There are six different cue types for events targeted by the score orientation routine. These are:

  • Pitch. Any MIDI pitch number can be specified as the target event.
  • Attack. At the opening of the window, a pointer is set to the most recent incoming MIDI event; the next event past this pointer — that is, the next event played — will trigger the state change.
  • Root. A chord root, expressed in absolute terms (C, F, Ab, etc.) as opposed to relative terms (I, IV, V, etc.) can used as a target event.
  • Function. Chord roots described as functions — that is, in relative terms (I, IV, V, etc.) — can be used as targets with this event type.
  • Time. This option essentially bypasses the windowing mechanism. The state change will be executed when the duration indicated has elapsed, regardless of other conditions.
  • No Attack. This type uses the Boredom parameter; when no MIDI input has come for a duration exceeding the Boredom limit, no-attack-type state changes will be executed.

This collection of cue types reflects the demands of the compositions learned by this implementation of Cypher. They have the virtue of simplicity and are quite robust; however, they are by no means an exhaustive list of the types of cues one might want, or even of the cue types the program could easily accommodate. In essence, any of the messages coming from a listener could be used as a cue; a certain register identification, or a phrase boundary, for example, could be recognized by the score orientation process and used to trigger a state change. This list should be regarded as an example, motivated by the circumstance of a few compositions; the complete list would be simply an inventory of all possible listener messages.

Once a cue has been found, any routine can be called with any number of arguments. A pointer to the routine and its arguments are saved when the specifications for the window are first introduced. In Cypher compositions, the arrival of a cue triggers the invocation of a score point, a typical example of which is illustrated in figure 3.15.

/* m 33 */
    SynthManager( SET, 1);
    TimbreManager(SET, 3);
    HookupLink(0, Grac, OR, Lin);
    HookupLink(0, Trns, OR, Low);
    HookupLink(0, Trns, OR, Midhi);
    HookupLink(0, Glis, OR, High);
    HookupLink(0, Accl, OR, Slow);
    HookupLink(0, Phrs, OR, Fast);
    HookupLink(0, Acnt, OR, Short);

Figure 3.15

In this bit of code, the sounds used by the composition section are changed with the SynthManager() and TimbreManager() calls, which install program changes and channel configurations. The rest of the score point makes connections between listener messages and player methods on level 1. The same effect could be achieved by calling up states saved in a state file; in fact, a parallel state file does exist for each piece, containing all the information laboriously written out here. Using this text encoding of the state changes is largely a convenience, as it allows examination of the state in a relatively succinct and easily editable form.

Score orientation has some attractive features, among them the fact that search is performed only around those moments where something needs to be found and that it continues to be useful in the face of scores that change from performance to performance; if anything about the context in an indeterminate section is constant, it can still be used to cue the score orienter. The technique has, however, some equally obvious problems. First among them is the fact that when a window ends without locating the target, the system has no way to tell whether the target event was not found because it was played before the window opened, or because it has yet to be played, or because an error in the performance means that it will never be played. Therefore, the program itself cannot adjust the positioning of the following window, because it does not know if its schedule of cues is running early or late. The cure for this problem would be to provide the orienter with better targets – not single pitches or attacks, but small local contexts. If most of the recognizable information from a whole measure is provided, for instance, missing one note within the measure would have no effect on the orienter.

Rehearsal Mechanisms

One of the most critical components of an interactive system is a rehearsal mechanism. Artificial performers should be able to integrate with normal rehearsal techniques as seamlessly as possible. This demands two things of the program: that it be able to start anywhere in the score, and that it be able to get there quickly. Musicians tend to start at structural breaks in the music and may start at a certain bar several times in rapid succession. Or they may rehearse a section of the piece repeatedly. If it takes the computer markedly longer to be prepared to start playing from any point in the score, human players’ patience with their neophyte performance partner may flag at the most critical juncture, during rehearsal.

In Cypher, a performance file is made up of complete state records for every change of state in a composition. A state is the set of variables whose values uniquely determine the behavior of the program. These variables include the connections between input features and transformation modules, output channel configurations, and the like. Each state comprises about 2000 bytes of data and so can easily be saved in great numbers without taxing the storage capacity of personal computers. Advancing from one state to the next is accomplished through the windowing score-orientation technique.

The Rehearse menu and state slider in the upper left on the control panel make up the state save-and-restore mechanism. The slider is used to select a state record number; these range from zero to sixty, though the upper limit is completely arbitrary and easy to change. To record a state to memory, the user first positions the slider so that the state number showing beneath it corresponds to the desired state record in the file. Then the item Save is selected from the File menu. This will record the current state shown on the interface to the currently open file, at the given state record number. If no state file is open, a dialog box will first appear asking which file is meant to receive the new record. The complementary operation is performed in response to the Restore item in the File menu. Again the record number is accessed from the slider, and then the requested state is retrieved from the open file. If no state file is open, a dialog box asks for the specification of one. If no such record is present in the open file, an error message is generated.

The interface accesses two modes of rehearsal: Single and Free. Both these options reference a state slider to find out what the initial state number for the rehearsed material should be. Single will return the program to that state and stay there. This mode allows rehearsal of small sections of the score for as long as is desired, without advancing to subsequent score states. In contrast, Free will revert to the state number given by the slider and commence performance of the piece from that point. In this mode, the succession of state changes will be executed as in a complete performance of the work, allowing rehearsals to start at any given point and continue from there.

An Example

The example in figure 3.16 is taken from Flood Gate, a composition for violin, piano, and Cypher. In this example, we will examine a score-orientation state change associated with a pitch cue. In figure 3.16, at the beginning of measure 6, the A circled in the piano part is the target of the window associated with state 1. The opening of the window arrives some seconds before measure 6, approximately one measure earlier. The circled A is chosen as a target for two reasons: first, because it is an important event in the context of measure 6, one likely to be treated carefully by the pianist in any event; and, second, because there are no other appearances of that pitch in the immediate vicinity – particularly, none in measure 5, when the window opens.

Figure 3.16

Figure 3.16

The expected arrival time of the A is the focal point of the score-orientation window for state 1. The estimate of when this cue will arrive is made at the time of the previous state change, that associated with state 0. When state 0 is activated, the current absolute clock time is noted. The ideal temporal offset of the next cue, that associated with state 1 from the time of activation of state 0, is added to the current time. The result is used as the estimated time of arrival of the following event. When the cue for state 1 arrives, again an ideal offset preceding state 2 is added to the current clock time.

Ideal cue arrival times are initially calculated from an examination of the score. Using the notated metronome marks as a guide, the temporal offset between one point in the score and another can be determined from the number of beats at a given tempo separating the two points. The rehearsal process will serve to bring these ideal estimates in line with the actual performance practice of a particular human ensemble. The offsets between cue points, and the window sizes surrounding each cue will typically be tuned during rehearsal to find a comfortable fit between Cypher’s score orienter and the other players.

Returning to our example, the window associated with state change 1 remains open until approximately the beginning of measure 7. When the expected A is found, the chord shown in the computer part at the beginning of measure 7 is scheduled, as is the opening of the next window, associated with state 2, targeted for the F at the start of measure 8. The chord, which is notated in the score at the downbeat of measure 7, will be scheduled as an ideal offset from the arrival of the cue pitch A; in other words, if measure 6 is played exactly in tempo, the chord will arrive exactly on the downbeat of measure 7. The gesture is composed to allow for the arrival of the computer’s chord somewhere other than exactly on the beat, however; the musical idea is a swell in the violin and piano leading to a dynamic climax accentuated by the computer part. In some cases, even with score orientation, the computer takes the lead: this is a natural example, in that the arrival of the chord marks the high point of the crescendo. The human players quite naturally adjust their performance to accommodate their machine partner’s timing.

If the end of the window associated with state 1 is reached without finding the A, that is, if we reach measure 7 without seeing the target, the chord and state-2 window opening are scheduled anyway. In that case, the chord opening measure 7 will be quite late, which is usually a very telling signal to the human players that Cypher is confused. The F of state 2 will then probably be found close to the start of the window for state 2, however, (since the window scheduling was initiated a little late), and the computer and human performers are then once again synchronized.

3.3 Cypher Performances

Several compositions were written during the development of Cypher. These works fell into two broad categories: largely notated pieces, and completely improvised performances. There are four notated works (which themselves all include some forms of improvisation), using human ensembles ranging from one to eight players. Musicians were constantly improvising with the program, usually on an informal basis, but on several occasions Cypher contributed to a more or less completely improvised performance in concert. In the notated works, human players are asked to perform at various times through either of two means of musical expression: interpretation of a written part, or improvisation. Interpretation comes into play during the performance of completely notated sections, in which case the players behave as the interpreters of Western concert music normally do: reading the music to be performed from the page, and rendering it with their own trained musical judgment. In other sections, the players are given the freedom to improvise, in keeping with the spirit of the composition. Usually, some constraints or indications of basic material are given to the players in the improvised sections; ideally, however, the improvisations are rehearsed and honed through the collective rehearsals of composer, performer, and computer.

During the evolution of Cypher, developments in my musical thinking have often been spurred by new formal possibilities the program provides. I have long been interested in developing ways to involve the player in shaping a composition during performance. The improvisational segments of pieces written with Cypher show an organic relation to the rest of the composition because the program knits together notated and improvised material in a consistent way. Responses to both sorts of material are generated identically, but the improvisational sections show great variation from performance to performance, while the notated parts stay relatively constant. Sharing material between the two styles of performance and weaving improvisational elaboration into an ongoing presentation of notated music, however, seem to make a coherent whole out of performances that might easily fall into irreconcilable pieces.

Notated Works

The first notated work I wrote with Cypher was Flood Gate (1989), a trio for violin, piano, and computer. The score includes conventionally notated music and some partially or completely improvisatory sections. The software development for this work concentrated on a number of the system’s fundamental performance tools: score orientation, studio to stage compositional refinement, and algorithmic responses to improvisation.

Whether the music is notated or improvised, the computer characterizes what it is hearing and uses that information to generate a musical response. In the completely notated sections, the output of the computer part is virtually identical from performance to performance, since the same material is being played into the same highly deterministic processes. There are some random numbers in these processes, but they are used in tightly constrained ways. For example, the ornamentation filter will add two pitches in close proximity to the input pitch. Although the ornamental pitches are chosen at random within a small ambitus, the effect of a small flourish around a target pitch is the perceived result; in other words, the use of random numbers is so tightly restricted that the relation of input to output is virtually deterministic. In the improvised sections, the output of the computer part is also a transformation of the human performer’s input: the transformations are a function of the features extracted from the music played to the computer. Each performance of the improvised sections is different; one goal of the composition was to marry composed and improvised elements in the same piece, using similar algorithmic techniques. Score orientation guided Cypher through sixty-one state changes in the course of a ten-minute piece, with no operator intervention.

Most of the processing used in this piece took place on level 1 of both the listener and player. This being the first essay, the most immediate forms of analysis and reaction were used. The work succeeds as an example of interaction on a featural level but points out the advantage of being able to guide processing on higher musical levels as well: phrase length effects such as crescendo and accelerando are largely missing from the computer output. Figure 3.17 is a page from the score, illustrating the material with which the pianist was presented for her improvisation.

Figure 3.17

Figure 3.17

The next composition to use Cypher was a work for solo piano and computer, called Banff Sketches (1989–91). This piece was, appropriately enough, begun during my residency at the Banff Centre in Alberta, Canada, during the summer of 1989, in collaboration with the pianist Barbara Pritchard. Subsequent performances, including those at the SIG-CHI ’90 conference, the MIT Media Laboratory, a Collage concert of Media Laboratory works at Symphony Hall in Boston, and the ICMC 1991 in Montreal provided opportunities to refine and expand the composition.

One of the main extensions added to the program for Banff Sketches was alternation. This is a mode designed to emulate the well-known compositional and performance practice of taking turns to elaborate some common material; in jazz a typical application of the idea is “trading eights,” where two performers will alternate improvisations of eight bars each, commenting and expanding on ideas introduced by the other. For such an idea to work with Cypher, the program had to know when to store material from the human player and when to compose an elaboration of that material in an unaccompanied presentation. In Banff Sketches, alternation patterns were stored as part of the score-orientation cue list, which informed the program whose turn it was to perform. During the human part of an alternation, Cypher stored what was being played, unanalyzed. When the program was due to treat the same material, it was sent through to the listener and transformed through message/method connections in the same way that composition is usually performed simultaneously.

Further, Banff Sketches provided the most extensive opportunity for improvisation of any of the principally notated works. In contrast to the organization of Flood Gate, improvisations are scattered throughout the composition, as commentaries and extensions to most of the musical ideas presented in the work. Computer solos often follow human/computer improvisatory collaborations, continuing to develop the material just played. Cypher’s performance in these sections included the first extensive use of the composition critic; modifications to the behavior of the program were not only a function of state changes coupled to score orientation, as in the case of Flood Gate or Sun and Ice, but also were effected by metalevel rules incorporated in the critic watching the composition section’s output.

An example of these ideas can be seen on page 12 of the score (figure 3.18); the first measure shown is a graphic representation of improvisational possibilities for the performer. The human performer may rely on these indications almost literally; a possible mapping of the graphic symbols to performance is provided with the score. If the player is more comfortable with improvisation, the symbols may be taken as little more than a suggestion. In all of the performances to date, the composer has had the luxury of working together with the performer to arrive at appropriate kinds of improvisation for each section.

Figure 3.18

Figure 3.18

Sun and Ice (1990) is a composition for four human performers and Cypher. The human ensemble performs on flute, oboe, clarinet, and a MIDI percussion controller. The percussion controller is directly connected to the program; the three wind players are tracked through a pitch-to-MIDI converter that takes an acoustic signal (all three wind instruments are played into microphones) and returns a MIDI representation of the pitches being played. Only one of the human players is ever being tracked by Cypher at a time; therefore, an important formal decision in the composition of the piece was mapping out which human performer would be interacting with the program when. The main software advance involved the use of compositional algorithms as seed material for introspective improvisation by the program. Structurally, the piece falls out into four big sections; at the end of the second of these, there is an extended improvisation section for the three wind players, followed by a computer solo. Here again we see the principle of a collaborative improvisation coupled with an elaboration of that material by Cypher.

The most extensive notated work was Rant (1991), a piece for flute, oboe, horn, trumpet, trombone, piano, double bass, soprano voice, and Cypher, based on a poem by Diane Di Prima. This piece was premiered on April 19, 1991, with soloist Jane Manning and conductor Tod Machover. The piece contrasted several sections for parts of the ensemble against other material presented by the group as a whole. One of the most challenging compositional problems presented by Cypher is the question of how to integrate it with a large ensemble. Full musical textures tend to integrate the contributions of many instruments into unified gestures or sets of voices. Adding an active computer complement to such a context can easily produce a muddled, unfocused result. In Rant, the solution adopted to the ensemble integration problem was to put the activity of the computer part in an inverse relation with the performance of the human ensemble. When the rest of the ensemble was performing together, Cypher adopted a strictly subsidiary role. In the numerous sections for smaller forces, the program came to the fore. In particular, two parts of the piece were written for soprano and computer alone, highlighting the interaction between the vocal material and Cypher’s transformations.


The following works used Cypher as a performer in a completely improvised setting: little if anything was agreed upon about the nature of the performance in advance, and the program had to contribute to the texture through its analysis capabilities and composition methods, which were either specified live, using the interface, or generated by the critic-driven composition rules.

Concerto Grosso #2 was the public debut of Cypher. This performance, part of the Hyperinstruments concert given at the Media Laboratory in June 1988, combined the improvisations of Richard Teitelbaum on piano, his improvisational software The Digital Piano, George Lewis on trombone, Robert Dick on flute, and Cypher. Routing of MIDI signals among the acoustic instruments (which were initially sent through pitch-to-MIDI converters) and computers was done onstage by Richard Teitelbaum. A subsequent, reduced performance was given by Richard Teitelbaum and Robert Rowe at the Concertgebouw in Amsterdam in the fall of 1988.

At the Banff Centre in the summer of 1989, a completely improvised musical performance, Universe III, was given by human and machine improvisers. The human players were Steve Coleman on saxophone and synthophone, and Muhal Richard Abrams on MIDI piano. The machine players were Coleman’s improvisation program and Cypher. Configuration of the MIDI streams was done onstage, so that the input to Cypher changed continually between the other human and machine performers and itself. The most intriguing result of this performance and the rehearsals leading up to it was the speed with which the jazz players were able to pick up on the transformations tied to features of their playing and to trigger them at will. Connections between the Cypher listener and player were made onstage: a human operator (myself) performed state changes and other major modifications to program behavior.

On a concert given at the MIT Media Laboratory on April 19, 1991, another collaboration combined the live synthophone playing of Steve Coleman with additional voices supplied by his own improvisation software and Cypher. The synthophone’s MIDI output was connected to three computers: one running Coleman’s software, a second running Cypher, and a third playing alto saxophone samples. Since the Banff Centre performance, Coleman’s software improviser has been extended to track live playing and find harmonic structure in it. The harmonies found are then used to direct the pitch selection of the melodic improvisation part. In this performance, Cypher was run to a large extent by an internal software critic. The normal production rules, adjusting the output of the composition section just before performance, remained in force. Additional rules examined the analysis of Coleman’s playing and made or broke connections maintained in the connections manager between listener reports and compositional methods. Further, extensive use was made of the harmonic and rhythmic adjustment transformations, tracking the behavior of the synthophone performance to pick up the harmonic context and beat period, then aligning the output of Cypher’s composition section to accord with it. The performance of the beat alignment transformation was particularly noticeable; once the beat tracker locked onto Coleman’s pulse, the composition section achieved an effective rhythmic coordination with the live playing.

3.4 Hyperinstruments

The hyperinstruments project, led by Tod Machover and Joseph Chung, has been carried out in the Music and Cognition group of the MIT Media Laboratory. “Our approach emphasizes the concept of ‘instrument,’ and pays close attention to the learnability, perfectibility, and repeatability of redefined playing technique, as well as to the conceptual simplicity of performing models in an attempt to optimize the learning curve for professional musicians” (Machover and Chung 1989, 186). Hyperinstruments are used in notated compositions, where an important structural element is the changing pattern of relationships between acoustic and computer instruments. As the preceding quote indicates, a focal point in developing hyperinstruments for any particular composition has been to provide virtuoso performers with controllable means of amplifying their gestures. Amplification is here used not in the sense of providing greater volume, but in suggesting a coherent extension of instrumental playing technique, such that a trained concert musician can learn sophisticated ways of controlling a computerized counterpart to his acoustic sound. One important expression of this concept has been to provide sophisticated control over a continuous modification of the synthetic timbres used to articulate hyperinstrument output.

Hyperinstruments are realized with the Hyperlisp programming environment, developed by Joseph Chung. Hyperlisp is available on the companion CD-ROM. Based on Macintosh Common Lisp Second Edition and the Common Lisp Object System (CLOS), Hyperlisp combines the power and flexibility of Lisp with a number of extensions designed to accommodate the scheduling and hardware interfacing needs of musical applications. Most of this functionality is built into an extensive library of objects, which have proved general enough to realize several quite different compositions, as we shall see shortly.

As the name would suggest, hyperinstruments follow an instrument paradigm. Applications are usually score driven, and the generation technique can best be described as a hybrid of generative and sequenced styles. Hyperinstrument compositions can be thought of as score driven, even though full-blown score following is generally not employed. The music to be played by the human performers is notated quite precisely, allowing latitude in expressive performance but not, generally, in the improvisation of notes and rhythms. Coordination with the computer is accomplished through a cueing system, where cues are sometimes taken from the MIDI representation of a performance and sometimes from the intervention of a computer operator.


The first composition to make use of the Hyperlisp/hyperinstruments model is Valis, an opera based on the novel of the same name by Phillip K. Dick. The music of the opera, beyond the vocal parts, is realized by two hyperinstrument performers, one playing piano, the other mallet percussion, where both instruments send MIDI streams to the computer for further elaboration. Several instruments were developed for different sections of the composition; here much of the groundwork was laid for the kinds of interaction that were to be developed in later compositions.

For example, one instrument animates chords performed on the keyboard with quickly changing rhythmic sequences applied to an arpeggiation of the notes of the chord. The performer controls movement from one chord to the next by playing through the score, and, in an extension of normal technique, can affect the timbral presentation of the arpeggios by applying and releasing pressure on the held notes of the chord. Moreover, the program is continually searching through a database of chords to find one matching what is being played on the keyboard; successful matches are used to move from one rhythmic sequence to another. In that way, the example is score-driven: an internal score is matched against performer input, and successful matches move the computer performance from one rhythmic sequence to the next. The generation is a hybrid of sequenced and generative techniques, since the stored rhythmic score is a sequence, but one that is generatively modified, in timbral presentation and rate of change from one sequence to the next, by the performance of the human.

Towards the Center

Towards the Center (1988–89) is scored for flute, clarinet, violin, violoncello, keyboard, percussion, and computer system. Of these, the strings and woodwinds are miked and slightly processed, while the keyboard and percussion parts are full-blown hyperinstruments. In this piece, the hyperinstruments sometimes form “double instruments,” in which control over the sounding result is shared between the two performers.

The instruments are designed so that each musician can influence certain aspects of the music, but both players are required to perform in ensemble to create the entire musical result. In one variation, the notes on the percussionist’s instrument are re-mapped to the pitches which are currently being played on the keyboard. Each octave on the percussion controller sounds the same notes but with different timbres. In addition, the keyboard player uses polyphonic afterpressure to weight the mapping so that certain notes appear more frequently than others. (Machover and Chung 1989, 189)

Hyperlisp maintains a running time grid, which allows various kinds of synchronization between performers and the computer part. For example, sequenced scores triggered from the live performance can each be synchronized with the grid, and thereby to each other, without requiring a perfectly timed trigger from the player. Similarly, the performance of the two human players can be pulled into synchronization by aligning incoming attacks with beats of the grid. Tolerances for the strength of attraction to the grid points, and for the speed with which misaligned events are pulled into synch, allow flexible modification of the temporal performance in a variety of situations.


The composition Bug-Mudra is scored for two guitars, one acoustic and one electric (both outfitted with MIDI transmission systems), MIDI percussion, and hyperinstrument system. A tape part for the piece is a recording of a sequence, mixed to tape in a single pass and striped with a SMPTE time code track, allowing the subsequent live computer treatment to be able to coordinate with the score in performance with complete accuracy and within 1/100 of a second. A transformation of the SMPTE information to Midi Time Code format is used as the clock driving a Hyperlisp application, enforcing the synchronization between the tape and live hyperinstrument processing. This synchronization is further expressed through the generation of an inaudible time grid, a reference point that is used to coordinate live performance, the tape, and much of the computer output. For example, the time grid is used to quantize the performance of the MIDI percussion player. Subtly shifting attacks coming from the percussion controller to fall exactly on a time point from the grid allow the performer to concentrate on timbral and gestural shaping of the performance rather than on the precise placement of each note in an extremely fast and demanding part.

A performer as yet unmentioned in Bug-Mudra is the conductor, whose left hand is outfitted with a Dextrous Hand Master (DHM) able to track the angle of three joints on each finger, as well as the angle of each finger relative to the back of the hand. The DHM is used to extend the traditional function of a conductor’s left hand, which generally imparts expressive shaping to the performance, in contrast to the more strictly time-keeping function of the right hand. The DHM in Bug-Mudra shapes and mixes timbres in the course of live performance, through an interpretation of the conductor’s hand gestures that serves to operate a bank of MIDI-controllable mixer channels.

The tape part of Bug-Mudra enforces a particular temporal synchrony not found in the other hyperinstrument compositions. Restrictive in the sense of possible performance tempi, the tape is actually liberating in terms of the precision with which computer and human rhythmic interaction can be realized. Rapidly shifting metric structures underpinning a highly energetic ensemble texture are a hallmark of the piece. SMPTE synchrony permits score following between the tape and computer to take place with centisecond resolution. Gestures from the conductor and electric guitar player, in particular, spawn expressive variation in the realization of the electronic part. The environment of Bug-Mudra can be thought of as an instrument, but one that is played by the coordinated action of all three performers plus conductor.

Begin Again Again…

The composition Begin Again Again . . . (1991) was written for hypercello and computer, on a commission from Yo-Yo Ma. For this work, a great deal of work was done to improve the acquisition of information about the live cello performance (see section 2.1.2 for a description of the hardware enhancements). MIDI is oriented toward a note paradigm, which represents essentially discrete events. The extension of the standard to continuous control parameters covers level settings, in essence, since each new value is used to change the level of some synthesis parameter directly, for example, pitchbend. Therefore, only the current value of the continuous control is of interest – there is no reason to try to characterize the shape of change over time.

Tracking performance gestures on a stringed instrument, however, does require the interpretation of parameters varying continuously. Identifying bowing styles from instantaneous levels, for example, is not possible: the system must be sampling and integrating readings taken over time. A number of objects were added to Hyperlisp to interpret various kinds of performance gestures, such as note attacks, tremoli, and bowing styles. These objects can be “mixed in” to hyperinstrument modes active at different parts of the piece; rather than having all gestures tracked at all times, gestures with significance for particular sections of the composition can be interpreted as needed. Because of the powerful tracking techniques developed for continuous control and the previously mentioned difficulties of determining the pitch of an audio signal in real time, the gestural information available from a performance of Begin Again Again . . . is virtually the inverse of the normal MIDI situation: continuous controls are well represented, whereas pitch and velocity are more approximate.

Necessity became a virtue as the piece was composed to take advantage of the depth and variety of continuous gestural information available. For example, one section of the piece uses an “articulation mapping,” in which different signal processing techniques are applied to the live cello sound, according to the playing style being performed. Tremolo thus is coupled to flanging, pizzicato to echo, and bounced bows to spatialization and delays. Further, once a processing technique has been selected, parameters of the processing algorithm can be varied by other continuous controls. Once tremolo playing has called up the flanger, for instance, the depth of chorusing is controlled with finger pressure on the bow (Machover et al. 1991, 29–30).

3.5 Improvisation and Composition

Interactive systems have attracted the attention of several musicians because of their potential for improvisation. Improvised performances allow a computer performer (as well as the human performers) broad scope for autonomous activity. Indeed, the relative independence of interactive improvisational programs is one of their most notable distinguishing characteristics. The composer Daniel Scheidt articulates his interest in incorporating improvised performances thus: “In making the decision to allow the performer to improvise, I have relinquished aspects of compositional control in exchange for a more personal contribution by each musician. An improvising performer works within his or her own set of skills and abilities (rather than those defined by a score), and is able to offer the best of his or her musicianship. Once having become familiar with [an interactive] system’s behavior, the performer is free to investigate its musical capabilities from a uniquely personal perspective” (Scheidt 1991, 13).

Further, improvisation demands highly developed capacities for making sense of the music, either by a human operator onstage or by the program itself. If a human is to coordinate the responses of the machine with an evolving musical context, a sophisticated interface is required to allow easy access to the appropriate controls. If the machine organizes its own response, the musical understanding of the program must be relatively advanced.

George Lewis

A strong example of the extended kinds of performance interactive systems make possible is found in the work of George Lewis, improviser, trombonist, and software developer. His systems can be regarded as compositions in their own right; they are able to play with no outside assistance. When collaboration is forthcoming, human or otherwise, Lewis’s aesthetic requires that the program be influenced, rather than controlled, by the material arriving from outside. The system is purely generative, performance driven, and follows a player paradigm. There are no sequences or other precomposed fragments involved, and all output is derived from operations on stored elemental material, including scales and durations. Interaction arises from the system’s use of information arriving at the program from outside sources, such as a pitch-to-MIDI converter tracking Lewis’s trombone. A listening section parses and observes the MIDI input and posts the results of its observations in a place accessible to the routines in the generation section. The generation routines, using the scales and other raw material stored in memory, have access to, but may or may not make use of, the publicly broadcast output of the listening section (Lewis 1989). Probabilities play a large role in the generation section; various routines are always available to contribute to the calculations, but are only invoked some percentage of the time – this probability is set by the composer or changes according to analysis of the input. The probability of durational change, for example, is related to an ongoing rhythmic complexity measure, so as to encourage the performance of uneven but easily followed sequences of durations.

George Lewis’s software performers are designed to play in the context of improvised music. The intent is to build a separate, recognizable personality that participates in the musical discourse on an equal footing with the human players. The program has its own behavior, which is sometimes influenced by the performance of the humans. The system’s success in realizing Lewis’s musical goals, then, follows from these implementation strategies: the generative nature of the algorithm ensures that the program has its own harmonic and rhythmic style, since these are part of the program, not adopted from the input. Further, the stylistic elements recognized by the listening section are made available to the generation routines in a way that elicits responsiveness but not subordination from the artificial performer.

Richard Teitelbaum

Richard Teitelbaum’s Digital Piano collection is an influential example of a transformative, performance-driven, instrument paradigm interactive system. The setup includes an acoustic piano fitted with a MIDI interface, a computer equipped with several physical controllers, such as sliders and buttons, and some number of output devices, often including one or more solenoid-driven acoustic pianos. This particular configuration of devices arose from Teitelbaum’s long experience with live electronic music, beginning with the improvisation ensemble Musica Elettronica Viva in 1967. Two motivations in particular stand out: (1) to maintain the unpredictable presence of human performers, and (2) to use an acoustic instrument as a controller, to contrast with the relatively static nature of much synthesized sound (Teitelbaum 1991).

Using the Patch Control Language, developed in collaboration with Mark Bernard (Teitelbaum 1984) (or, more recently, a reimplementation in Max of the same design), Teitelbaum is able to route musical data through a combination of transformations, including delays, transpositions, and repetitions. Before performance, the composer specifies which transformation modules he intends to use, and their interconnections. During performance, material he plays on the input piano forms the initial signal for the transformations. Further, he is able to manipulate the sliders to change the modules’s parameter values and to use buttons or an alphanumeric keyboard to break or establish routing between modules. In this case, the musical intelligence employed during performance is Teitelbaum’s: the computer system is a wonderful generator of complex, tightly knit musical worlds, but the decision to change the system from one configuration to another is the composer’s. The computer does not decide on the basis of any input analysis to change its behavior; rather, the composer/performer sitting at the controls actively determines which kinds of processing will be best suited to the material he is playing.

A new version of Patch Control Language, written in Max by Christopher Dobrian, has extended Teitelbaum’s design. The transposer subpatch is shown in figure 3.19. Across the top are three inlets, which will receive pitch, velocity, and transposition values from the main patch. When the velocity value is nonzero (that is, the message is a Note On), the transposed pitch corresponding to the pitch played is saved in the funbuff array. Notes with a velocity of zero are not passed through the gate object and so call up the stored transposition: in this way, even if the transposition level changes between a Note On and Note Off, the correct pitch will be turned off.


Figure 3.19

Another module, the excerpter, is shown in figure 3.20. The excerpter is built around the sequencer object visible on the right-hand side of the patch, between midiformat and midiparse. The rightmost inlet, at the top of the patch, sends packed pitch and velocity pairs from MIDI Note On and Off messages to the sequencer for recording. The next inlet from the right, marked “playback tempo,” supplies exactly that: scaled through the split and expr objects, this inlet eventually sets the argument of a start message. The next inlet to the left sets a transposition level for pitches played out of the sequencer and simultaneously fires the start message, which begins playback from the sequencer at the specified tempo. Finally, the leftmost inlet is used to start and stop recording into the seq object.

Figure 3.20

Figure 3.20

This again is indicative of the transformative approach embodied by Richard Teitelbaum’s improvisation software: the excerpter takes in music played during performance, which can then be played back out under a variety of transformations, including tempo change and transposition. Added to this is the elaboration that up to four identically functioning tracks can be operating simultaneously, yielding several layers of differently varied material.

Richard Teitelbaum’s software, like George Lewis’s, is designed to function in an improvisational context. The musical intent is to establish a dynamic, expressive field of related material. Expressivity is transmitted through the system by maintaining the performer’s input, subject to elaborative variations. The technique of chaining together relatively simple transformation modules allows the composer to produce variations along a continuum of complexity, where the number and type of transformations chained together directly affect the degree and kind of complexity in the output. Similarly, the output is more or less recognizably related to the input as a function of the number of processes applied to it. The system tends to resemble a musical instrument more than a separate player, since the decision to change its behavior is made by the composer during performance, rather than by the program itself.

Jean-Claude Risset

Jean-Claude Risset’s Duet for One Pianist uses an interactive system programmed with Max and is written for the Yamaha Disklavier played as if by four hands. That is to say that a human performer plays two hands’ worth of the piece, and MIDI data from that performance is transmitted to a computer running Max, which applies various operations to the human-performed data and then sends out MIDI commands to play two more hands’ worth on the same piano. For example, the section “Double” plays sequences in response to notes found in the score, and “Fractals” adds several notes in a quasi-octave relationship to each note played (Risset 1990).

Figure 3.21

Figure 3.21

The Max patch was written by Scott Van Duyne at the MIT Media Laboratory; the subpatch in figure 3.21 is from the “Mirrors” section of the work, which casts notes played by the human back around a point of symmetry to another part of the piano (“Mirrors” can be heard on the companion CD-ROM; the subpatch resides there as well). The opening is a quotation from the second movement of Webern’s Op. 27, which uses the same principle. We can see from the patch that the basic algorithm is quite simple: the right inlet receives a point of symmetry (expressed as a MIDI note number), and the left inlet the performed MIDI notes. The point of symmetry is multiplied by 2, and incoming pitches subtracted from that product. The result moves the new pitch an equal interval away from the symmetry point as the original note, in the opposite direction.

The subpatch of Figure 3.22, called transpup, is at the heart of the “Fractals” section. Five of these subpatches are active simultaneously, casting pitches played by the human up the piano in differing intervallic relationships. The resulting structures are fractal in their self-similarity but also related to Shepard tones in their seeming combination of motion and stasis. The transposition and delay inlets to transpup are simply passed through to the first two outlets, allowing the outside patch to monitor what has been passed through. A simple pipe object delays transmission of the pitch and velocity inlets by the same duration, set by the “delay.” “Fractals” can be heard on the companion CD-ROM.

Figure 3.22

Figure 3.22

Risset’s work forms an interesting contrast with the systems of Teitelbaum and Lewis; although an extensive network of interactive relations is explored, the piece is a completely notated piece of concert music, rather than an improvisation. The Duet is usually performance driven but sometimes tracks portions of the human part more closely, in a score-driven style. In the performance-driven sections, particular pitches on the piano often advance the behavior of the Max patch from one state to the next. The piece clearly follows a player paradigm; the title alone is evidence of as much. All three types of response method are in use.

David Jaffe and Andrew Schloss

David Jaffe and Andrew Schloss have explored several styles of interaction in live computer music, as composers, players, and in duet performances with each other. Their piece Wildlife is an ensemble interaction for four participants, “two biological and two synthetic” (Jaffe and Schloss 1991). Two computers, a NeXTstation and a Macintosh IIci, are running separate interactive programs. David Jaffe performs on a Zeta electric MIDI violin, and Andrew Schloss on the Boie Radio Drum (see section 2.1.3). The Macintosh tracks and processes MIDI coming from the violin and drum and passes MIDI commands to an on-board SampleCell and to the NeXT computer. The NeXT performs further processing of the MIDI stream coming from the Macintosh, executes some compositional algorithms, and controls synthesis on a built-in DSP 56001 and a Yamaha TG-77 synthesizer.

Each of the five movements of the work implements a different kind of interaction. In the first, the violin triggers piano sounds through a chord-mapping scheme, in which certain pitch classes are coupled with chords and octave information is used to control the chords’ registral placement. The chord maps are chosen by the drum, producing a double instrument between the two players: each is responsible for some part of the sounding result. Further, the drum has its own performance possibilities, and the violinist can choose to step out of the chord-mapping mode and play single notes or send pitchbend controls to the drum sounds. This situation is indicative of the musical environments of Wildlife – all four ensemble members are responsible for part of the resulting texture, and several combinations of interaction are possible. Part of the performance, then, becomes the choice of interaction mode and the way ensemble members choose to affect each other’s action.

The second movement is again built on a subtle interaction between the violin and drum players. The violin triggers sustained organ sounds when a foot pedal is depressed. Over the drone, the violin can play additional acoustic sounds without triggering more organ sounds until the foot pedal is pressed again. Meanwhile, the drum has access to the material played by the violin and can color it timbrally. Either single notes from the violin performance can be chosen and played with various timbres, or a continuous mode will play back a series of notes from the violin material, with the tempo determined by the drummer holding a stick above the surface of the drum, like a conductor. Similarly, in the fourth movement pizzicato playing by the violin sparks no computer response itself, but the material is sent on to the drum, where it will be elaborated by the percussionist. In the last movement, the radio drum’s x and y coordinates are mapped onto a continuous DSP vowel interpolation space, whereas the z dimension is mapped to loudness. At the close, all four ensemble members contribute to a bluegrass fantasy.

Wildlife is a performance-driven interactive system. A hybrid generation method including transformative and generative algorithms is ingeniously distributed between the two human players and two computers. The system seems close to the instrument paradigm: the computer programs become an elaborate instrument played jointly by the violin and percussion. In that respect, some modes of interaction in Wildlife seem close to the “double instrument” techniques of Tod Machover’s Towards the Center. Unlike that piece, the primary style of interaction is improvisation, in which various strategies of collaboration and sabotage play a guiding role in the formation of the work.

David Wessel

David Wessel has worked extensively in improvisational settings, using custom software written in Max (Wessel 1991). The software is a collection of listening assistants, composing assistants, and performing assistants. It has usually been used in duo improvisation settings, where one player improvises on an acoustic instrument and Wessel affects the interactive environment with some MIDI controller, such as the Buchla Thunder. Both acoustic and MIDI signals are captured from the acoustic player; the computer performer is able, with the help of the listening assistants, to record and analyze the acoustic performance in real time. Then, the composing and performing parts are used to formulate a response and adjust it to the demands of the concert situation.

One of the fundamental ideas of the interaction is that the computer performer is able to record phrases from the acoustic performer onstage. In early versions, a MIDI keyboard “was configured so that the lowest octave functioned as a phrase recorder. Recording was initiated when a key in this octave was depressed and stopped when it was released. The stored fragment was associated with that key and the corresponding keys in the other octaves of the keyboard were used to play back transformations of the recorded phrase” (Wessel 1991, 345).

One complication with this model is that the computer operator is not able to indicate that a phrase should be saved until it has already begun to be played. To be able to transform appropriate phrases, the software must be able to store ongoing musical information and “reach back” to the beginning of phrases when they are called for. For this, Wessel divides the program’s memory into short-term and long-term stores. The short-term memory holds MIDI-level information, whereas the long-term memory uses higher-level abstractions. With the short-term store, listening assistants perform phrase boundary detection, to notice those structural points that will demarcate the onset of saved phrases. Following the grouping mechanisms of (Lerdahl and Jackendoff 1983), Wessel uses a time-gap detector to indicate phrase boundaries.

From this we can already see that Wessel’s system is a performance-driven one. Transformation is an important response type: one method generates chords from melodic material by maintaining a weighted histogram of performed pitches and favoring those related to an estimated tonal context. In other words, notes from a melodic line are run through the histogram, and the pitches with most resonance in the harmonic field are employed to generate accompanying chords. Pitches extraneous to the current harmony are discarded.

Clearly, this approach represents a performer paradigm system. In (Wessel 1991), however, the composer articulates a vision of interactive performance similar to the “dual instrument” models of Jaffe and Schloss, or Machover’s Towards the Center: following the example of Stockhausen’s Microphonie I, David Wessel envisages new computer environments where the performers and computer are jointly involved in the generation of sound. Rather than an additive superimposition of individual sound sources, one source collectively controlled, in Wessel’s view, could offer new possibilities of interaction on both technical and musical levels.

Cort Lippe

The composer Cort Lippe has been involved in many of the projects realized at IRCAM using Max’s score-following and signal-processing capabilities. His own composition, Music for Clarinet and ISPW (1992), exemplifies that experience. In the signal flow chart shown in figure 3.23, we see the various levels of processing involved in the realization of the piece. First of all, the clarinet is sampled through an ADC and routed through the pitch tracker resident on an IRCAM i860 DSP board (ISPW) mounted on the NeXT computer (see section 2.3.2). The output of the pitch tracker goes on to a score following stage, accomplished with the explode object. Index numbers output from explode then advance through an event list, managed in a qlist object, which sets signal-processing variables and governs the evolution of a set of compositional algorithms.

The signal processing accomplished with the ISPW is quite extensive and includes modules for reverberation, frequency shifting, harmonization, noise modulation, sampling, filtering, and spatialization. In fact, all of the sounds heard from the computer part during the course of the piece are transformations of the live clarinet performance. The processing routines listed above can pass signals through a fully connected crossbar, such that the output of any module can be sent to the input of any other. These routines are receiving control information from the qlist and from a set of compositional algorithms, which “are themselves controlled by every aspect of the clarinet input: the raw clarinet signal, its envelope, the pitch tracker’s continuous control information, the direct output of the score follower, and the electronic score’s event list all contribute to [their] control” (Lippe 1991b, 2).

Figure 3.23

Figure 3.23

Cort Lippe’s Music for Clarinet and ISPW is a score-driven interactive music system. The explode/qlist score-following combination described in section 3.2.2 is used to track the clarinet player’s progress through a notated score. The signal-processing facilities in the extended version of Max written for the ISPW allow extensive timbral transformations of the live clarinet performance, as well as pitch and envelope tracking to feed the score-following mechanism. The response methods are transformative and generative, but in the extended sense of using timbral transformations and synthesis rather than the note-based MIDI routines usually found. Finally, the piece embodies a performer paradigm, in that the clarinet is engaged in an expansive dialog with a second musical voice; a voice, moreover, that is a regeneration and reinterpretation of her own sound.

3.6 Multimedia Extensions

To this point we have examined an array of interactive systems, moving from improvisation through notated works to extended forms of interaction based on real-time signal processing. A number of developers have moved beyond the design of systems for music only, to building environments capable of integrating visual with musical effects. We will look at three of those endeavors here: two involved with synchronizing musical improvisation and real-time animation, and the other coupling lighting effects to interactive music analysis.

Don Ritter

Orpheus is an interactive animation system designed by the visual artist Don Ritter. The program accesses stored digital video frames, and can display these in any order, combined with one of over 70 cinematic transitions, including fades, flashing, and the like. Images and transitions are linked to and synchronized with real-time input from a MIDI source. Correspondences stem from a user-defined control file, which indicates relations between various musical parameters, and the visual effects that will appear in response. “For example, a control file may define that an increasing musical interval in the third octave will cause the display of a bouncing ball in a forward motion; a decreasing interval, however, will display the ball in a backward motion” (Ritter 1991, 1). Orpheus analyzes several features of the musical performance, including register, dynamic, pitch, note density, duration (of notes or rests), chord type, and timbre. Performances often include two instances of Orpheus, one producing background imagery, and the other foreground. Both use distinct images and control files.

The analysis performed relies on a comparison of incoming MIDI values with a collection of thresholds, which may be defined by the user. For example, MIDI pitches are classified as high or low according to their position relative to a pitch threshold. Loudness is decided the same way. The program keeps track of timing information, including when notes go on and off, to determine a note density per unit time. A control file can define correspondences between the results of these three classifications and sets of images and transitions. We may think of the three analyses as three binary switches; each classification takes one of two values. Therefore, the three taken together form a featurespace with eight locations. Each point in the featurespace can be associated with an “action list”: these are scripts specifying image frames and transitions, which will be applied when the corresponding group of classifications is found in the input.

Additional Actions become active when note and rest durations become greater than their associated threshold values. This situation will cause a change in cinematic transitions, but not in frame selection. For example, if the rest threshold is set at two seconds and the long rest Action is set at “drop screen,” when a rest greater than two seconds is encountered, the currently displayed frame will slide to the bottom of the screen and display only the top 10% of the image. Imagery will stay in this “dropped” position until the next note occurs, after which the screen will jump back to its normal full screen position. (Ritter 1991, 8)

Don Ritter’s animation system has been used in collaboration with such improvisers as George Lewis, Richard Teitelbaum, Trevor Tureski, and David Rokeby. Because it is capable of characterizing some salient features of an ongoing musical performance, Orpheus can change its behavior in ways that are not known in advance but that nonetheless demonstrate correspondences which are immediately apparent to an audience.

Roger Dannenberg

Roger Dannenberg has carried out a number of important developments in interactive systems design, including compositions exploring the combination of algorithmic composition and interactive animation. In his piece Ritual of the Science Makers, scored for flute, violin, cello, and computer, Dannenberg uses MIDI representations of the human performances to launch and guide processes generating both musical textures, and accompanying images. These are often related; in the heraldic opening of the work, for example, crashing sounds in the computer part are tied to graphic plots of the amplitude of the crashes, using images akin to trumpet bells (Dannenberg 1991).

The piece illustrates the complexity of combining processes controlling several media, or even several uses of the same medium, simultaneously. In section 2.2.1, we reviewed the cause() routine, a basic scheduling facility in Moxie and the CMU Midi Toolkit. Ritual of the Science Makers relies on cause(), within a Moxc/C programming environment, to govern the scheduling of processes for music and animation. Certain operations of sequencing, performance capture, and interactive parameter adjustment, however, were better handled with the Adagio score language. To benefit from the advantages of process design and temporal sequences, the program for Ritual allowed Adagio scores to call C routines and set variables and allowed C to invoke Adagio scores. Adagio scores are interpreted and so can be quickly changed and reinserted into the performance situation, allowing a much more flexible and effective environment for the adjustment of parameter settings than would be the case if they had to be recompiled in the C program at each change.

Most music-scheduling regimes require that the processes invoked not be interrupted except by a limited number of hardware-servicing routines. This nonpreemption requirement simplifies numerous control flow and data consistency problems. In a mixed animation/music environment, however, Dannenberg found that the combination of relatively slow graphics routines and music processes requiring a much greater degree of responsiveness was best handled by introducing a limited interrupt capability: “the solution I have adopted is to divide computations into two groups: (1) time-critical, low-computation music event generation, and (2) less critical graphical music computations. The graphical computations are off-loaded to a separate, low-priority “graphics” process that is preempted whenever a time-critical event becomes ready to run in the primary “music” process. To coordinate graphical and music events, messages are sent from the music process to the graphics process to request operations” (Dannenberg 1991, 153).

Ritual of the Science Makers is a completely notated score, though the interaction would work with improvised material as well. In fact, the ensemble’s parts are simply transcriptions of studio improvisations recorded by the composer. Roger Dannenberg suggests that as multimedia applications are extended into real-time interaction, some of the relatively stable techniques used for purely musical systems, such as nonpreemptive scheduling, may have to be reconsidered. These problems become even more acute when signal processing and large-scale analytical processes are thrown into the mix.

Louis-Philippe Demers

The programmer and lighting designer Louis-Philippe Demers has created an interactive system that follows musical contexts as it changes lighting patterns during performance. The standard method for real-time lighting control saves collections of settings among which automated cross-fades can effect a gradual change. Demers’s system, @FL, instead describes lighting as a series of behaviors; moreover, these behaviors can be tied to musical contexts in production-style rules. For example, a lighting behavior in @FL could be “A happens quickly when B and C occur simultaneously, otherwise A happens slowly” (Demers 1991). @FL is a graphic programming language, quite similar in style to Max: objects are represented as boxes with inlets and outlets, and hierarchical patches can be programmed by describing the data flow between them.

Several performances have been realized through @FL. The earliest was a realization of Stravinsky’s Rite of Spring, played on two computer-controlled pianos in an arrangement by Michael Century. The lighting program changed the intensity of light on either piano as a function of the density of the music being performed on that instrument. In a collaboration with me, listener messages from Cypher were used to control lighting behaviors described with @FL in a performance of Banff Sketches. Zack Settel’s composition Eschroadepipel for clarinet and interactive music system similarly used ongoing information about the state of the musical score to direct changes in lighting behavior controlled through Louis-Philippe Demers’s system.