As a simple but understandable figure of the imagination, we each have in our minds a committee of “experts” which are the criteria we will consult when making decisions. These criteria are of various kinds: some are inherited, some are needs, but there are also appointed criteria, and there is a time in which they can and will be in this appointed position. If, however, you find repeatedly that this committee doesn’t come to a conclusion you actually approve of, you fire it. But then you have to find other criteria. Composition is a wonderful method for discovering not-yet-appointed criteria.
-Herbert Brün

Chapter 3 reviewed an array of existing interactive music programs, describing them integrally and in terms of their relation to the dimensions of the evaluation framework developed earlier. Here we will similarly abstract principles of existing programs away from their original vehicle of implementation, looking at formalisms for composing music with a computer. Strictly speaking, it is inaccurate to say that these programs are concerned with modeling human compositional techniques: usually, they are used to work out some procedure with greater speed, more accuracy, and less “cheating” than their human creators would be able to muster. In particular, composers use compositional algorithms to develop high-level methods, procedures that operate on a control plane governing the long-term evolution of notes and durations without forcing the composer to consider each event individually.

I consider composition methods as belonging to three broad classes: (1) sequencing, or the use of prerecorded musical fragments, (2) algorithmic generation from some small collections of pitch or rhythmic elements, and (3) transformation — simple, modular changes to already complete musical material.

• A sequence is a list of stored MIDI events, which can range in duration from gestures of a few seconds, to entire compositions many minutes long. Sequencing is by now quite familiar because of the collection of commercial editors available for recording and manipulating sequences. Most of the familiar systems, however, are not designed for real-time interactive use in live performance. Rather, the operative bias is in favor of preparing complete compositions to be played back from beginning to end, perhaps with some timing variations to synchronize with video or other machines. Our particular interest here will be in how such sequences can be varied and conditionally integrated with live musical contexts.
• The second class of composition methods comprises generative algorithms, which produce material from some stored collections of data (such as pitch or duration sets), often using constrained random operations to spin out a distinctive gestural type from the seed elements. Such routines can often be invoked with a number of performance parameters, to set such things as the duration within which to continue generation, a pitch range to be observed, the speed of presentation, etc.
• The transformation of material is accomplished by taking the input to an interactive system and systematically varying it along some parameter. A simple example would be to take all arriving Note On messages and add two to the pitch number before playing them back out. The effect of such a transformation would be, naturally, to transpose all incoming material up a whole step. Several transformation systems allow the combination of many simple changes, resulting in more complex, compound relations between input and output.

6.1 Transformation

Richard Teitelbaum developed an intricate form of transformation in his Patch Control Language, written with Mark Bernard (Teitelbaum 1984), and in the more recent Max implementation of the same idea. Transformation in Teitelbaum’s system arises from combinations of a number of simple operations, such as delay, transposition, or looping. The generation of output with these modules is controlled by the composer during performance, as he calls up stored patches and applies them to some incoming MIDI stream. The mode of performance is predominantly improvisation, and the decisions of the composer concerning the choice of patches, which of several inputs to send through them, and the sounds to use in articulating the result shape the structure of the whole.

Figure 6.1

Figure 6.1

The Max patch shown in figure 6.1 is an adaptation of the inverter module from the version of Teitelbaum’s software implemented by Christopher Dobrian. The leftmost inlet turns the inverter on or off; pitch and velocity numbers are sent to the next two inlets across the top. If the inverter is on, 120 is subtracted from incoming MIDI pitch numbers. The absolute value of that result is taken as the new pitch number. Therefore, the point of symmetry is middle C (MIDI note 60): the absolute value of (60 -120) is 60 again. Any pitch other than middle C will be flipped around to form the same interval with middle C as the original, but in the opposite direction — the C an octave above middle C will come out an octave below, and so forth. Compare this with the mirror module shown in section 3.5. The inversion algorithm used by Jean-Claude Risset has much the same effect but can change the point of symmetry. These inverters are perfect examples of the idea of transformation: input from some source is modified in a consistent, simple way and sent back out again or on through further transformations.

Transformation seems to be a method quite well suited to works involving improvisation; Teitelbaum’s usage is an example. Daniel Weymouth similarly makes prominent use of transformations in his composition This Time, This for Zeta violin and a Max patch written by the composer. The violin score is a combination of notated music and indications for improvisation; the improvisations sometimes follow constraints. For example, the pitch may be notated and the rhythmic presentation left free; conversely, pitches are sometimes improvised to notated rhythms. The computer part is an intriguing blend of transformation and generation. Material from the violin performance is often used as a seed for rather elaborate responses: at the center of the work, a harmonic complex used repeatedly throughout is compressed into a series of chords, which are triggered by single pitches coming from the violin. At other times, the work takes advantage of the fact that the controller can send information from each separate string on individual MIDI channels: toward the end of the piece, the upper strings spawn soft, high chords, while the lower ones provoke bursts of sound that break up into reverberation. An excerpt from This Time, This can be heard on the companion CD-ROM.

Transformation in Cypher is accomplished through the chaining of many small, straightforward modules. The action of these modules is cumulative: if more than one is used on some material, they are applied serially, with the output of one operation being passed to the input of the next. Although the action of any module taken singly is simple and easy to follow, longer chains of transformations can build up material that is quite complex, though deterministically derived from its inputs. In figure 6.2, a source chord is first sent through the arpeggiation module, which separates the pitches, and then through the looper, which repeats the material given it.

Figure 6.2

Figure 6.2

Level-1 Filters

The transformation objects implemented in Cypher all accept the same three arguments: (1) a message, which selects one of two possible methods in the object, (2) an event block, a list of up to 32 events to be transformed, and (3) an argument, whose function changes according to the message. The two messages accepted by transformation objects 11 are xform and mutate. The xform message selects the method that applies a transformation algorithm to an argument number of events in the event block. The mutate message will use the argument to change the value of some parameter controlling the behavior of the transformation algorithm. In this case, the event block is left untouched. When called with the xform message, all transformation objects return a value that represents the number of events in the block to be output after the transformation is applied.

Often, several transformations will be applied to the same event block. In this case, the transformations will be executed serially, with the output of one transformation sent through to the input of the next. In this situation, it makes a difference in which order the transformations are applied. For example, the objects arpeggiate and grace will produce two different effects, depending on which of the two is applied first. If applied as grace -> arpeggiate, a quick grace note pattern will lead up to an arpeggiation of some input chord. In the arpeggiate -> grace order, the arpeggiation would be done first, and then all notes in the arpeggio would have a separate grace note figure added to them. Because of this difference, each transformation filter has associated with it a priority. Before applying a series of filters, the composition section will order the filters to be used by their priority index. Then, the filters will be executed serially, with the highest priority transformation performed first, the result of this sent on to the next-highest priority transformation, and so on. The user can change the priority of any filter through a simple manipulation of the interface. The association of priorities to filters is remembered from execution to execution as part of the program’s long-term memory. Further,priority orderings can be changed in performance as different reaction types demand different application sequences.

Descriptions of the level-1 transformation objects follow. The descriptions will be in terms of how the object changes the event block arriving as input, and what kinds of changes to the transformation behavior can be effected by incoming mutate messages.


The accelerator shortens the durations between events. Cypher events have associated with them a value called an offset, which is the duration separating it from the previous event. Shortening this value causes an event to be scheduled for execution sooner than would be the case were the offset left unaltered; this quickening of execution time results in the events being performed at an accelerated rate. The state variable controlling the behavior of the accelerator is called downshift. In the normal case, downshift is just the number of milliseconds subtracted from every event offset in the block. A mutate message can be used to change the value of downshift. Increasing it will cause events to be scheduled more quickly; decreasing the downshift value will slow down event scheduling.

The acceleration algorithm is applied to all events in the input block except the first. The first event in every block has something of a special status: normally, the first event will be performed with no delay. This is because we generally want the response to be as fast as possible. And in fact, a transformed event with no delay will be virtually simultaneous with the original input. To achieve the greatest responsiveness, the first event in a block is given an offset of zero, before all transformations. Accordingly, the accelerator has no effect on the first event (unless some other transformation has already replaced its offset with a positive value).

Figure 6.3

Figure 6.3

The Max subpatch in figure 6.3 could form the kernel of several kinds of time transformation, acceleration being one of them. The patcher shown has three inlets and two outlets. The inlets take MIDI pitch and velocity values and a delay time in milliseconds. The same pitch and velocity values sent to the object will be output after the specified delay has elapsed. To realize a Cypher-like accelerator, this fragment could be applied to stored events each comprising a pitch, velocity, and delay time. To make the events played faster, simply decreasing the stored delay time before sending the events through the delay patcher would suffice.


The accenter puts dynamic accents on some of the events in the event block. There are two state variables associated with it: a count, which keeps track of how many events have been processed; and the variable strong, which indicates how many events are to pass before an accent is generated. The effect of the module is to place a dynamic accent once every strong events. A mutate message can be used to change the value of strong. Using mutate, higher-level processes can change the accentuation pattern. An accent is described in terms of the MIDI velocity scale, in which 127 is the loudest onset amplitude. The accenter sets the velocity of strong events at 127 and of other events (to highlight the accentuation) at a significantly lower value.

Figure 6.4

Figure 6.4

The Max patch in figure 6.4 performs the accenter algorithm. MIDI Note messages with a velocity of zero (Note Off) are passed through unchanged. The select object connected to the notein velocity outlet sends nonzero velocities (associated with Note On messages) to the counter. With each incoming velocity, the counter is incremented by one. When the counter resets to zero (because it has reached the maximum), the select object beneath it will bang a message box, setting the velocity to 127. For every other value coming from the counter the velocity is set to 70. The number of notes which must pass before an accent is played can be changed with the dial.


The arpeggiation method unpacks chord events into collections of single-note events, where each of the new events contains one note from the original chord. The algorithm is straightforward: for each note in the input event, a new event is generated, with the pitch and velocity taken over from the original chord note. The state variable available for mutation is speed, which determines the temporal offset separating the events of the arpeggio. Arpeggiated events will be scheduled to play at intervals of speed milliseconds; therefore, increasing the speed variable will slow down the succession of arpeggiated notes, and decreasing the speed variable will quicken them.

Figure 6.5

Figure 6.5

The rudiments of a Max arpeggiator are shown in figure 6.5. The pitches of the chord are stored in a table, and the counter object steps through them. There is an inlet to the subpatch, which expects the total number of notes in the chord. After each note is played, a bang is sent to the delay object, which will pass it on to the counter after some number of milliseconds has elapsed; this duration is set with the dial connected to delay’s right inlet. When the bang is passed on, the next note will sound, and so on. Note that this fragment is a perpetual arpeggiation machine: once begun, it loops through the chord indefinitely. Adding a termination path once the counter hits the maximum would make it equivalent to the Cypher arpeggiator.


The backward module takes all the events in the incoming block and reverses their order, so that the event which would have been played last will instead be output first, and so on. There is no mutation message provided for backward. In Max, a similar effect could easily be achieved by simply changing the direction inlet of a counter object reading through a table.


Basser plays the root of the leading chord identification theory, providing a simple bass line against the music being analyzed. The state variable sensitivity is available for mutation in this routine. Sensitivity refers to the confidence rating returned from the chord identification agent for the current theory. Each time basser is called, it queries the chord identifier twice: once to get the current root and again to read the confidence level. If the confidence level is higher than the value of sensitivity, a bass note will be played; otherwise the module remains silent.

Root notes are played within the second full octave on a piano keyboard, that is, within the MIDI pitch range 36 to 47. Further, new bass notes will be played no faster than one every 25 centiseconds; it is usually hard on synthesized bass sounds, and somewhat disconcerting, to hear a bass line with attacks several times a second. Basser does not funnel its output through the event block mechanism, but schedules its own performance. This is because there is no transformation of the input being performed; a new voice is being generated in addition to the original events.


The chorder module will make a four-note chord from every event in the input block. There is an array of three intervals, which will be used to limit the pitches in the chord. For every event in the block, the first pitch in the event is taken as a starting point, and the other three pitches are generated as intervals within the bounds given by the interval array. For example, the first additional pitch is chosen at random within 7 half-steps of the original note. The second must be within 15 half-steps, and the last within 23. If any pitches exceed the upper limit of the keyboard range, they are changed to the highest pitch on the keyboard. Using this algorithm,all chords generated by the chorder will be smaller than two octaves in total span,and contain four pitches more or less evenly distributed throughout that range. There is no mutation message available for this module.

Figure 6.6

Figure 6.6

The Max patch in figure 6.6 is an implementation of the chorder algorithm. The velocity of incoming note messages controls the direction of the graphic switch at the top of the patch. Note On messages
spawn new notes through additions to constrained random numbers. These additional pitch numbers are saved in the int objects at the bottom, after first being limited to the keyboard range, and sent on to noteout. When Note Off messages arrive, their zero velocity sends the pitch number out the left outlet of the graphic switch. Then they bang out the saved pitch numbers, turning off the same chord that was turned on initially.


The decelerator lengthens the duration between events. Increasing an event offset causes it to be scheduled for execution later than would be the case were the offset left unaltered; this slowing of execution time results in the events being performed at a decelerated rate. The state variable affecting the behavior of the decelerator is the upshift — the number of centiseconds added to every event offset in the block. A mutate message can be used to set the upshift to a new value. Increasing the upshift will cause events to be scheduled more slowly; decreasing the upshift value will speed up event scheduling.

For every event in the input, upshift is added to the offset if that lengthening results in a value less than 400 centiseconds. If the new offset exceeds that threshold — in other words, if the resulting offset would cause the event to be delayed by more than 4 seconds — 200 centiseconds are subtracted from the offset instead. This will cause the event to be scheduled two seconds earlier, in effect an acceleration’ however, this quickening also means that subsequent deceleration~ will again cause a slowing within the 4-second margin. The overall behavior will be a gradual slowing, to the 4-second limit, one quick acceleration, and another gradual decelerando. The Max subpatch of figure 6.3, shown in connection with the acceleration module, could similarly form the basis of a decelerator. The only difference between the two would lie in the calculations used to determine the delay sent to the subpatch’s rightmost inlet.


The flattener is one of the few modules that will undo variation found in the input block. This module flattens out the rhythmic presentation of the input events, setting all offsets to 250 milliseconds and all durations to 200 milliseconds. The result is a performance that is significantly more machinelike than most inputs, since all attacks will be exactly evenly separated one from another. There is no mutation variable associated with this transformation. Again, the simple Max subpatch of figure 6.3 could achieve the same effect if it were sent a constant delay time with every pitch/velocity pair.


The glisser adds short glissandi to the beginning of each event in the input block. There is one state variable available for mutation, which controls the maximum length of the generated glissandi. The first step of the transformation method calculates a glissando length as follows: howmany =((rand()%length)+ 1 );. The random number generator is called, and its output used modulo the length variable, to limit the upper bound. One is added to the result to ensure that all executions will add at least one new event to each incoming event. Therefore, mutating the length variable will change the maximum number of notes allowed in a glissando.

Glissandi are generated from below the input note and run up to it in half-step increments. The interval between the input event and the first note of the glissando, therefore, depends on the calculated length. If, for instance, a glissando length of five is needed, the first pitch will be a perfect fourth below the input event, to allow five half steps of ornamentation going up to the original pitch. These new pitches are generated with a constant speed (70 milliseconds apart) and velocity (rather loud-MIDI value 110).

Figure 6.7

Figure 6.7

Figure 6.7 is a Max glissando patch. When a new note arrives, it provokes a new random number within the range set by the dial and opens the gate used to control the delay object later. The random number is subtracted from the pitch to set the value of a counter; the pitch itself sets the counter’s maximum value. Then, the counter will play through the pitches of the glissando; after each one is produced, a bang message is delayed 70 milliseconds before making the next note. When the gliss has reached the original pitch number, the gate flips to the left outlet, and no more bangs will reach delay, thus stopping the process.


The gracer appends a series of quick notes leading up to each event in the input block. Every event that comes in will have three new notes added before it. All of these new notes will be at offsets of 100 milliseconds, resulting in a fast, grace-note-like ornamentation of the original event. The added pitches are chosen at random from a range set by the space variable. Grace notes will usually be chosen to appear below the original pitch; random numbers are generated within the space range and then subtracted from the event pitch. If this results in a note below MIDI number 40, the modification is instead added to the original pitch, producing a grace-note figure that leads down to the original event. The space variable can be modified with a mutate message, changing the pitch range within which grace notes will be selected.

Figure 6.8

Figure 6.8

The Max patch in figure 6.8 performs a transformation like the gracer. For every pitch played, three new notes will be produced at random within 15 semitones below the original (this range can be changed with the dial). Offsets of 100 millisecond separate each pitch. After the three grace notes, the original pitch is played out, and the process stops until another note is performed. The only difference between figure 6.8 and the Cypher version is that no notice is taken here of the tessitura; grace notes will come from below no matter how low the original pitch is in the range.

The harmonizer modifies the pitch content of the incoming event block to be consonant with the harmonic activity currently in the input. The basic idea of the algorithm is the following: After ascertaining the current chord and key, pitches in the event block are nudged into a scale consonant with those analyses. The contour of the pitches is maintained as closely as possible; pitches dissonant with the chord and/or key are moved up or down one or two halfsteps to a consonant pitch. For example, an F sharp found in an event block associated with the tonic chord in C major would be nudged up to a G natural, changing a highly dissonant pitch to a consonant member of the tonic triad.

The specifics of how to modify incoming pitches are maintained in the nudge array, which has three dimensions corresponding to mode, function, and original pitch. The first dimension, mode, has two possible values, which represent major and minor versions of the current key. This allows an immediate selection between two kinds of modification, one for each mode. The second dimension goes into the mode-specific transformations and finds the list associated with the function of the current chord. Function is used in the music-theoretic sense here: a C major triad in the key of C major, for example, has a tonic (or, in the program’s numbering scheme, zero) function. There are 24 possible functions, corresponding to the major and minor versions of triads built on the twelve scale degrees. The final dimension of the nudge array comes from the pitch number of the incoming note, which is considered regardless of octave, meaning that the value of this dimension can be one of twelve possibilities.

With these three dimensions, the harmonizer is equipped to transform any incoming pitch class, played against any chord function in either the major or minor modes of the key. There are no mutation possibilities associated with this module, but there should be: according to the style of music being performed, the nudge array itself should be replaced. Some styles place stricter constraints on allowable dissonances, and the way these are resolved, than others. If the program were able to recognize styles and had at hand a repertoire of nudge arrays corresponding to them, it could swap in the appropriate harmonic behavior for each known style.


The inverter takes the events in the input block and moves them to pitches that are equidistant from some point of symmetry, on the opposite side of that point from where they started. In other words, all input events are inverted around the point of symmetry. For example, if the symmetry point were middle C, all pitches above middle C would be transformed to others the same distance below middle C as the original pitches were above it. High becomes low, and the higher the original pitch, the lower the transformed pitch. The point of symmetry can be altered with a mutate command: the default setting is 64, which is the MIDI pitch number for the E above middle C, the exact center of an 88-key piano keyboard. A mutation can change this point to any other MIDI pitch number. Pitches that, after transformation, would exceed the MIDI pitch boundaries are pinned at either the highest or lowest possible pitch, whichever is closer to the calculated value. We have already seen two Max versions of the same idea: the mirror subpatch from Jean-Claude Risset’s Duet in section 3.5, and the inverter from Richard Teitelbaum’s software shown in figure 6.1.


The loop module will repeat the events in the input block, taken as a whole. In other words, all of the events in the block will be performed once, then all will be performed again, and so on, until the desired number of loops is reached (in contrast to a looping scheme in which each individual event would berepeated some numberoftimesbefore continuing on to the next). The number of loops performed depends in part on the limit variable. There are always a minimum of two loops performed, including the original events. So the least action taken by this module would be to repeat the original events once. Additional repeats are generated at random, up to the value of limit. For example, if limit were equal to four, the module would perform randomly from one to five repeats. The limit can be reset with a mutate message.

Figure 6.9

Figure 6.9

The Max patch of figure 6.9 is the essence of a looper. Banging on the button at the top of the patch will cause it to loop through the first four elements in the table three times. The carry count outlet of the counter object tells how many times the counter has gone from its minimum to maximum values. Once the carry count has reached 3, in this patch, the switch sending bangs through the delay back up to the counter is set to the off position. In this example, all the patch does is flash the bang button under the table object twelve times. One can easily imagine how this could be extended to provide a Cypher-style looper: if the table were filled with MIDI pitch numbers, routing the table outlet to a makenote/noteout pair, for example, would loop through the pitches in the table.


The louder module adds crescendi to the events in the input block. This is another case in which the transformation would not be heard on blocks of only one event. The solution here is to add a second event to singleton blocks. The amount of velocity change is arrived at quasi-randomly; that is, the following calculation is used: change = (rand()%limit)+ 1;. This varies the velocity modification randomly up to a maximum set by the variable limit, with a minimum value of 1.

Each event in the block will receive an increasing augmentation of its velocity. The first event has change added to it. Before each succeeding event, change is increased, then added into the next event velocity. If ever the new velocity exceeds the MIDI maximum (127), the value of change itself is used. Then the algorithm proceeds as before. The effect is a crescendo, more or less gradual (depending on the value of limit), up to the MIDI limit, followed by a sudden drop in loudness, followed by another gradual crescendo. A mutate message can be used to change the value of limit, effectively varying the speed of loudness change.


This module adds an obbligato line high in the pitch range to accompany harmonically whatever activity is happening below it. The algorithm for doing this is very simple: the chord agency is queried for the root of the current harmonic area. The root is played out in the octave starting two octaves above middle C. The other wrinkle on this behavior is that the module only plays out an obbligato note for every fourth event in the incoming event block. This ensures that the obbligato line will move more slowly than the other material. The frequency, in events, of obbligato output can be changed by sending the module a mutate message with a new setting.


Ornamenter adds small, rapid figures encircling each event in the input block. Two new events are added for each one coming in. The new events will circle the original pitch, with one new event above it and one below. The distances above and below the original pitch are chosen at random, within a boundary controlled by the width variable. The calculation is change = ((rand()%width)+1). Width keeps the output of the random number generation within a boundary; one is added to make sure that all new pitches will differ from the original by at least one half step. The new events have a constant offset (100 milliseconds) and velocity (110 MIDI). The width variable can be changed with a mutate message.

Figure 6.10

Figure 6.10

The Max patch in figure 6.10 implements an ornamenter. When a new note arrives, it causes random numbers within the range set by the dial to be sent to the plus and minus objects on the left side of the patch. First, the random number is subtracted from the original and played out. After a delay of 100 milliseconds, the same random number is added to the original and played. Finally, 100 milliseconds later, the original pitch itself is performed, and the process stops.


The phraser module temporally separates groups of events in the input block. Adding a significant pause between two successive events plays on one of the strongest grouping cues and tends to induce a phrase boundary at the break. The state variable phrase_length determines how many events will pass before a pause is inserted. When phrase_length events have gone by, the offset of the following event will be lengthened at random by a duration somewhere in the range of 400 to 1670 milliseconds. The value of phrase_length can be changed with a mutate message.


Quieter adds decrescendi to the events in the input block. Again, the effect would not be heard on blocks of only one event, so a second event is added to singleton blocks. The amount of velocity change is arrived at quasi-randomly; that is, the following calculation is used: change = (rand()%limit)+1;. This varies the velocity modification randomly, with a maximum somewhere below the value of limit and a minimum of 1. The value of change is subtracted from the velocity of the first event in the block (which have become two if the original input was a single event). For each succeeding event, first the value of change is augmented, and then the event’s velocity is diminished by that value. If the newly calculated velocity goes below 10, change is subtracted from 127 (the maximum MIDI velocity), and the result is used as the new event velocity. The effect will be a gradual decrescendo (the speed of which is determined by limit), an abrupt jump to a loud velocity, and another gradual descent. The value of limit can be changed with a mutate message, varying the speed of loudness change.


This module adds four pitches to each input event, in a kind of sawtooth pattern. It is similar to the ornamenter in that it also adds new material above and below the original pitch. In this case, two pitches will be added above and two below, in alternation. The amount by which these new notes will deviate from the input is determined by the state variable width. The value of width can be changed with a mutate message.

Figure 6.11

Figure 6.11

Figure 6.11 is a sawer patch produced with some modifications to the ornamenter shown in figure 6.10. Here, there is a counter object after the delay. A modulo operation follows the counter, such that even counts will cause a pitch above the original to be generated, and odd counts, a pitch below the original. This sawer will generate the incoming pitch first, then four new pitches, two above and two below, in alternation. The interval between the original and modified pitches will change with each note, because the random object is prodded for a new value each time.


The solo module is the first step in the development of a fourth kind of algorithmic style, lying between the transformative and purely generative techniques. Because its operation is not a transformation in the same sense as the other processes described in this section, I will reserve discussion of solo for section 6.4.


The stretcher affects the duration of events in the input block, stretching them out beyond their original length. The state variable mod controls the range of variation that will be introduced. The calculation of the new duration for each event in the block is coded thus: duration = (rand()%mod)+500. First, a random number is generated; this is limited to the maximum specified by mod. The result of the modulo operation is added to 500, giving a minimum lengthening of 500 milliseconds. The maximum lengthening is determined by mod, and the value of mod can be changed with a mutate message. A similar effect could easily be implemented in Max; a calculation such as the one outlined above would be needed to produce duration values fed into a makenote object, for example.


Swinger modifies the offset times of events in the input block. The state variable swing is multiplied with the offset of every other event; a value of swing equaling two will produce the familiar 2:1 swing feel in originally equally spaced events. If the input events are not equally spaced to begin with, the swing modification will have more complex results. The value of swing can be changed with a mutate message.


The thinner reduces the density of events in the input block. Most of the transformations, if they change the number of events presented at the input, change it by adding new events. Consequently, the density of events emanating from the composition section can reach quite significant levels. Thinner is one tool for reducing the amount of material coming from the player. The state variable thin controls how the reduction is done. A count is associated with the module, which is incremented with each note of every event. When thin divides evenly into the count, the corresponding Note is deleted, and the offset of the Event that contains the Note is increased by one second. For example, if thin were 3, and an incoming event held six Notes, two of the Notes would be deleted and the offset of the Event would be increased by two seconds. The density of incoming material will be reduced, then, in two ways: some chords will hold fewer notes, and pauses of one or more seconds will be added between some events. The value of thin can be changed with a mutate message.


To do the tightenUp, events in the input block are aligned with the beat boundary. On entering the module, the beat agency is queried for the current beat period. Next, the offset times of all events in the input block are added together. If the combined offsets are equal to the beat period, then the final event of the block will sound on a beat boundary. This is the desired effect of the module: if the offsets are already in such an alignment, the method returns with no further action. If the combined offsets have a duration less than the beat period, the last offset is extended by the difference between the two — again placing the final event in the block on the beat. If the combined offsets’ duration is longer than the beat period, the difference is divided by the number of events in the input block, and the result is subtracted from every event offset. In other words, all events are quickened by a constant value to make the final event again fall on the beat. There is no mutate message associated with this module, and the transformation does not add any events to the ones already present on input.


Transposer changes the pitch level of all the events in the input block by some constant amount. The distance by which the pitches will be moved is calculated as follows: interval = (base%limit), base is a random number, limit will keep the value of this random number within some bound, and interval is set to the result. Then, for each event in the block, interval is added to the current pitch. If the resulting pitch exceeds the upper limit of the pitch range, interval is instead subtracted from the original event pitch. So, transpositions will generally be upward, though near the top of the pitch space they will move the other way. The value of limit can be changed with a mutate message. A good Max transposer from Richard Teitelbaum’s software is shown in figure 3.19.


The tremolo module adds three new events to each event in the input block. The new events have a constant offset of 100 milliseconds. They surround the original pitches, with two new events above the original and one below, or two below and one above; in either case, the higher and lower new events alternate. The distance new pitches will be removed from the original is determined by the following calculation: first a random number is generated. This is limited to a maximum of 12, then added to 2. The addition ensures that all new pitches will differ from the original by at least one whole step. The maximum deviation between original and new pitches is an octave plus one whole step. There is no mutate message associated with this module.

Figure 6.12

Figure 6.12

A few small changes allow us to change the sawer patch from figure 6.11 into a version of the tremolizer, shown in figure 6.12. Here, the intervals of the new pitches are the same in both directions, because the random object is not asked for a new value between attacks. Further, a simple change to the counter ensures that three new pitches will be generated for each incoming event, not four. This demonstrates, first, that Max patches can easily be modified to achieve a range of similar behaviors from one basic algorithm. Second, if the changes being made here were subject to modification during performance (adding a dial to set the upper bound of the counter object, for example), this same range of behaviors could be generated onstage in response to varying musical circumstances.


Triller adds four new events to each eventi n the input block. These will be performed as a trill above or below the original pitch. The new events have a constant offset of 100 milliseconds. The placement above or below the source pitch is determined randomly, but trills that would come out below MIDI pitch 30 are always played above the source, and trills that would come out above MIDI pitch 100 are always played below. The trills will be the source pitch alternating with a trill pitch either a half step or a whole step above or below it. The entire figure will begin and end on the source pitch, alternating with two trill notes. The choice between half- or whole-step trills is made randomly. There is no mutate message associated with this module. Again, a simple modification of the basic Max patch shown in figures 6.10, 6.11, and 6.12 could produce the same effect: if the tremolo patch from figure 6.11 were initialized to produce random numbers not larger than 2, a version of the Cypher triller would result. The only difference is that the Max version would not be sensitive to register and that the Max patch begins on the source note, but does not end on it.

6.2 Generation Techniques

A small but critical distinction separates transformation methods from generation techniques. As we have seen, transformation methods perform some operation on representations emanating from an external source. In the case of generation techniques, the output of a compositional formalism is derived solely from the operation of the formalism itself, possibly aided by stored tables of material. A way to distinguish between generation techniques and transformation, then, is to note that transformation methods require live input and generation techniques do not. This distinction is a fine one, and tends to blur easily, as do many of the classification metrics we have been discussing. Still, it seems to capture a noticeable difference in the way composers approach algorithmic methods: either the machine is changing something it hears or is generating its own material from stored data and procedures.

In Cypher, generation methods are used to produce clearly distinguishable textures and accept messages to influence their behavior during any particular performance. For example, the tremolo algorithm produces a rapidly changing pitch field that sweeps up the keyboard range, folding around to the bottom of the range when it has reached the top. The overall duration of the process is established with the set message. Within that duration, the algorithm calls itself many times, using continue messages. The temporal offsets between these self-invocations are determined by a calculation that alternately lengthens and shortens the duration between calls. Each invocation of the routine includes a seed pitch, and a fixed set of three intervals determines the notes generated around this seed. Every seed will produce ten new pitches,arrived at by continually cycling through the array of intervals. Three and one-third repetitions of the interval array are heard for each invocation of the routine. The seed pitch itself, then, is shifted upward with each new call of the routine until it reaches the top of the range, at which point it is folded back around to the bottom. The entire process continues until the duration specified in the set message has elapsed. All of the generation methods produce short fragments when selected from the alg menu; they will use the Bank and Timbre sounds currently selected.

Clarence Barlow has a highly developed algorithmic music generation method, which he has implemented in a real-time program called AUTOBUSK (Barlow 1990). Following the ideas laid out in his article “Two Essays on Theory” (Barlow 1987), and brought to compositional fruition in the remarkable Colgluotobüsisletmesi for solo piano (1978), AUTOBUSK uses concepts of stability, clarity, consonance, and dissonance to affect the microtonal and rhythmic behavior of musical textures. It is a generation method that uses seed material supplied by the user to spin out music according to its algorithmic principles. Each instance of the program can realize three voices in real time. “AUTOBUSK can be run autonomously or — using more than one computer — in series or in parallel: a serial connection causes MIDI output to be interpreted as input control by the next computer along in the line, whereas a connection in parallel permits synchronized sets of six, nine or higher multiples of three voices” (Barlow 1990, 166).


An example of an interactive generation method is the Pat-Proc program written by Phil Winsor of the Center for Experimental Music and Intermedia at the University of North Texas (Winsor 1991). Pat-Proc is divided into two parts: the first part generates pitch material for manipulation by the second part, which performs various operations, prominently including looping techniques. The program first realized generation methods derived from minimalism and later expanded these to cover a wider stylistic range. The two hallmarks of generative methods are quite easily seen from the way the program is divided: stored material is manipulated by procedures generating recognizable musical textures. The procedures are governed by control variables whose manipulation allows the composer to affect the generation of any particular output.

In fact, the division of Pat-Proc into two tasks, material definition and melody writing, allows two levels of generation method. In the first task, a number of procedures are available for generating the pitch material, which will be elaborated further during melody writing. To define the basic pitch material, a user may define the pitch set by hand, use a constrained sieve to generate pitches, or define a basic interval set, which will be spun out into scales by the program. The melody-writing section of the program generates linear voices from the basic scales, using a collection of methods. Pitches can be chosen from the scale serially, following a random distribution, or through an interval sieve. Further, the user can specify percentages of rests to be included or of the number of notes to receive ornamentation. Several voices can be generated in parallel, and rhythmic procedures can be applied to control their polyphonic presentation.

Pat-Proc is not used in live performance, but a great many of its features could easily be adapted to such usage. As it exists, Pat-Proc performs all the necessary calculations for each voice in the resultant texture and then produces output in one of a variety of formats-an alphanumeric note list, MIDI sequence, or conventional music notation. The characteristics of the algorithm, however, are what interest us here: Pat-Proc, as it is conceptualized and used, exemplifies an algorithmic generation method. In a two-stage process, some stored basic material is defined and then elaborated compositionally by a number of procedures whose function can be affected by a user through a collection of control variables.

Markov Chains

Markov chains formed the basis of several compositions by Lejaren Hiller and his colleagues from the early 1960s, and have since been implemented in a number of interactive systems. Markov chains are series of linked states. Each state moves to a successor state, in what is called a transition (Ames 1989). The state at the beginning of a transition is the source; the state at the end is the transition’s destination. In a Markov chain, each successive destination of one transition becomes the source of the next. The behavior of the chain is captured by a table of transition probabilities, which gives the likelihood of any particular destination being reached from some source.

In Experiment 4 of Hiller and Isaacson’s Illiac Suite (1957), the transition table was weighted to produce melodies favoring harmonic (consonant destinations) and proximate (small-interval destination) continuations (Ames 1989, 176). Markov chains can be made to index transition tables that rely on more than one source event. The number of previous states used to determine the next destination state is called the order of the chain. Chains that use only the source state to determine a transition are called first order; chains that use the two most recent states to find the transition to the next are second order, and so on.

In interactive systems, either explicitly Markov-like transition tables or closely related probability schemes have been used as generative algorithmic techniques. A realization of Markov techniques in Max, developed by Miller Puckette, has been used at IRCAM in compositions by Philippe Manoury, including Jupiter and La partition du ciel et de l’enfer. Saxophonist and interactive system designer Steve Coleman has written a drummer program, which uses probabilities to decide on the placement of percussion sounds on any sixteenth pulse of a phrase, in a variety of styles. These probabilities are conditioned by immediately preceding choices, in first-order Markovian fashion. Similarly, George Lewis uses probabilites to select from tables of stored melodic and rhythmic material, and these probabilities are again modified by the successions actually played out.

Max Generation Objects

We have reviewed extensively the use of Max to transform musical material; several composers have used the language to implement more strictly generative processes as well. Jeff Pressing described a general method for algorithmic composition in his article “Nonlinear Maps as Generators of Musical Design” (Pressing 1988). On the companion CD-ROM, the Max patch quaternmusic implements a realization of these ideas. The user controls the behavior of the generator by manipulating four variables: “The behaviour of the central equation, a four-dimensional quaternion logistic map, depends critically on the value of the parameter a. The four components of a, that is, a1, a2, a3, and a4 are the parameters manipulated by the musician. Depending on the value of a, the equation can produce fixed points, limit cycles, intermittency, chaos, divergences to infinity, and other types of behaviour”
(Pressing 1991, 1).

Figure 6.13

Figure 6.13

The Max object MAXGen was developed at the Université de Montreal by composer Jean Piché and his team. MAXGen allows the specification and execution of arbitrarily complex control functions, which can be applied to synthesis or compositional parameters in real time. Envelopes, tendency masks, and step functions can be drawn in a graphic editor within Max or “played” into the object through MIDI. Once defined, functions can be stretched, compressed, copied, or merged. Figure 6.13 shows an editing window from MAXGen. The tool icons arrayed in the upper left-hand corner represent various drawing and selection operations. Other controls change the representation on the screen and initiate processing of selections from the editing window, including stretching and the other operations listed previously. The notion of a tendency mask is extensively supported in MAXGen: in the work of several composers, including most notably G. M. Koenig and Barry Truax, tendency masks are used to vary the range of a random number generator over time. As the limits of the mask are adjusted, they set the upper and lower bounds for the generation of random numbers. The resulting values are then used to control pitch, durations, or synthesis parameters.

Gary Lee Nelson has developed a number of pieces based on explorations of fractals in music that are implemented in Max. Aspects of Benoit Mandelbrot’s fractal theory that have attracted composers, among them Nelson, J.C. Risset, Michael McNabb, and Charles Wuorinen, include the property of self-similarity, in which forms are repeated on several scales simultaneously, and “strange attractors,” numbers in a fractal system that tend to exert a particular pull on the field around them. The application of these ideas to musical form has come in several guises; some researchers have taken fractal principles to the level of sound synthesis, where the forms of the pressure waves resemble the form of entire phrases and compositions (Yadegari 1991).

In Fractal Mountains, Gary Lee Nelson uses fractal graphs, whose method of generation ensures self-similarity on several levels. A collection of three graphs governs pitch, rhythm, duration, and density values in a microtonal, algorithmically generated environment. One graph controls pitch and rhythm by mapping the vertexes onto frequency (the y axis), and attack point (the x axis). Similarly, another graph affects the duration of notes through a function determining the number of sounds that should be heard simultaneously, affecting the overall density. The third graph maps onto dynamics, through MIDI velocity values. The entire system is made interactive by a technique that generates the graphs in real time from notes played on a MIDI wind controller. The pitch and velocity values from incoming wind controller events are used to place points on the graphs, which generate “fractal mountains,” and proceed to spawn new musical continuations based on the principles previously described.

6.3 Sequencing and Patterns

The most widespread applications yet developed for computer music are the class of programs collectively known as sequencers. A sequencer provides recording and editing capabilities for streams of MIDI data. More recently, tracks of audio information can be coordinated with the MIDI tracks as well. Several interactive systems have developed ways to incorporate sequences in performance, modifying aspects of the playback in response to live input. The Max programming language, for example, has a number of facilities for recording and using sequences.

Because of the normal use of commercial applications, sequences are usually thought of as successions of Note On and Note Off messages, with some occasional continuous controls and program changes thrown in. Stored strings of MIDI data not including Note messages can be used quite effectively in interactive performance, however: Roger Dannenberg’s composition Ritual of the Science Makers employs sequences in just this way (Dannenberg 1991). Rather than recording Note commands, sequences in Ritual record parametric changes. For example, values fed to digital signal processing algorithms are changed over time by a sequence of control information initiated at the appropriate moment in the performance. Another section of the work transforms pitches arriving from the ensemble of flute, violin, and cello into chords: these transformations, then, are varied through the section by a sequence of transposition values.

Sequences can be specified in several ways: one of the most common is to use a commercial application to capture and order MIDI data, then dump the results into an interactive environment through the use of the Standard MIDI File Format. Another way is simply to capture a stream of MIDI in real time during performance, as we will see Max’s seq object do presently. A third way relies on some form of text editing to specify events and their arrangement in time. This way in particular allows function calls and manipulation of program variables to be interspersed with Note and controller information. The text-based score language of the CMU MIDI Toolkit, called Adagio, supports the integration of function calls with more traditional score information. The qlist concept used at IRCAM in Max is another expression of the same idea.

Sequencing in Max

Figure 6.14

Figure 6.14

The basic sequencing object in Max is called, appropriately enough, seq. The seq object records a stream of MIDI messages coming from the midiin object and can play these back out in a variety of ways to the midiout object, as shown in figure 6.14. There are three message boxes attached to the inlet of seq, which will instruct it to perform these functions: hitting the record message will start seq recording the MIDI stream coming from midiin. Start makes seq start playing back whatever messages it has recorded, from beginning to end, and stop will halt either recording or playback.

Figure 6.15

Figure 6.15

The more extensive patch built around the seq object shown in figure 6.15 was developed for part of Dinu Ghezzo’s composition Kajaani Nights, premiered by the New York University Contemporary Players in November of 1991. The patch includes three sequencers, with separate record, start, and stop messages, as well as controls for independently varying the speed and transposition level of playback.
The record, delay, start, and stop message boxes are all connected to the left inlet of their respective sequencers; the connections are “hidden on lock” here to make the mechanism more easily visible. In the performance, an operator recorded material from an opening chorale, using the “3way Record” button at the top. Once the chorale was loaded into the sequencers, the separate controls for each were used to layer the material with various transpositions and speed changes.


The composer and performer Bruce Pennycook led the development of the MIDI-LIVE interactive computer music environment, an important component of which is the ability to record and play back sequences in performance. MIDI-LIVE grew through the realization of a series of five compositions, all part of the PRAESCIO series written by Pennycook. The name “praescio” came from a vision of interacting “prescient” partners, human and computer, each of whom would ”’know’ a portion of the musical material for the work but only during the moment of realization — the performance — would the completely formed piece unfold” (Pennycook 1991, 16).

Sequences for MIDI-LIVE were described with mscore, a version of Leland Smith’s score program. The sequences could be assigned to tracks accessible by MIDI-LIVE, and each piece specified conditions under which the tracks would be triggered by performer actions. MIDI note numbers, program change buttons, and foot switches were among the devices used to affect the state of MIDI-LIVE. A simple production system controlled the coupling between performer actions and machine responses. The production system consisted of <if-then> pairs, where the <if> conditions of the rules consisted of logical expressions on the state of performance devices (the foot switches, program change buttons, etc.), and the <then> actions were sets of program responses. For example, a simple rule in the production system might indicate that when a foot switch is depressed, the computer will record MIDI data coming from a keyboard, and when the foot switch is released, recording will cease. Production rules were themselves arranged in numbered sequences, where each rule in the sequence would have to fire before the next would become active.

The sequencing facilities offered in MIDI-LIVE enable some basic kinds of interaction between performers and prerecorded MIDI material. First, the performer controls the timing of recording and playback. The system does not implement tempo following, but the initiation time of any sequence is chosen by the performer. Further, controlled improvisation sections give the player access to a palette of sequences, which can be called up and played in any order. MIDI-LIVE allows extensive use of continuous control information to incorporate such devices as MIDI mixers in real-time performance. Finally, many simple transformations have been built in that can be used to vary the playback of any sequence track: output channel assignment, looping, transposition, tempo and velocity scaling, and harmonization are among them.

Sequencing in Cypher

Sequencing is little used in Cypher, primarily because the emphasis has been on the algorithmic techniques of generating an appropriate response in real time, rather than calling up a prerecorded response under certain conditions. The capability of using sequences in this way, however, is supported by the software, and the idea of launching fragments from a small library of possibilities when associated structures are encountered in the input is part of the orientation of the enterprise. The main implementational concern is that processing of the sequences should not degrade the performance of other listening and compositional tasks.

Sequencing was better supported in an earlier version of the program: the current implementation, however, is not far from providing the same facilities. The basic idea is this: A process that commences performance of a sequence file can be attached to any listener message. Such a process becomes a composition method like any other. Once initiated, a pointer to the sequence file is kept in a list of open sequences. Then, in the main Cypher loop, the sequence scheduling routine schedule_nextsecond() will be called.

Schedule_next_second() looks at the list of open sequences each time it is called. For each pointer in the list, one second’s worth of sequenced material is read from the file and sent to the scheduler. All sequencer files follow the standard MIDI file format; events in such files are time stamped with their offset from the previous event (as is done in Cypher). Therefore, schedule_next_second() can send file events to the scheduler, adding up successive offsets until one second’s worth of events have been read.

When one file has the required duration of events scheduled, the routine goes to the next pointer in the list until all sequences have been processed. The idea behind this handling of sequences is that Cypher makes a complete pass through its main loop at least once a second. If all open sequences have one second’s worth of events scheduled each time through, they are guaranteed to continue playing until the following execution of schedule_next_second(). The program will never get stuck scheduling all of a long sequence while other processing waits, however, because files are never read all at once (unless they are quite short to begin with). Because scheduling of sequence files is interleaved with all other processing, Cypher can handle the performance of several sequences simultaneously, without a noticeable degradation in its ability to respond to other live input. Similarly, the listener can be pointed at a sequence file, and respond to the music in it through the usual listener/player mechanism, even while it is playing back the original sequence itself.

6.4 Cypher’s Composition Hierarchy

We have seen the structure of Cypher’s listener hierarchy. Level-1 analyses of individual sounding events are passed up to level 2, which groups them and describes their behavior over time. On the composition side of the program, a similarly hierarchical function can be found. Methods on the first level operate, generally speaking, on individual sound events. Second-level methods are concerned with the direction and regularity of groups of events. This distinction between levels is blurred, perhaps even more so than is the case on the listener side. Level-1 methods often deal with local transitions between groups of two or three notes; level-2 methods could well end up affecting only a single event.

Level 2

Level-2 composition processes are invoked by messages arriving from the level-2 listener and affect the behavior of the composition section over phrase-length spans of time. The messages sent out from the listener have to do with grouping and regularity. The processes on the composition side are clustered around those types of messages and effect various kinds of change to the musical flow. One common strategy on level 2 is to make and break connections between features and transformations on level 1. This is a good way to achieve the complement to some kind of behavior observed by the listener. Reports of regularity for some feature, for example, could be met by processes generating change in the same domain. Another strategy is to mutate the level-1 transformation modules; control variables relevant to the transformation method can be changed according to the nature of the input being reported by the listener and the current state of the variable. For example, the accelerando module can be made to accelerate its input more or less than it already does as a function of the speed feature found by the listener and the accelerando rate currently active.

Level 2 is also the appropriate place to perform processes of expressive variation, which extend across groups of sounding events. Variation in timing or loudness can be used to accentuate structural boundaries between events as they are produced. The ideas of second-level composition are closely allied with the concerns of level-2 analysis: the regularity, or types of change, of collections of events; the grouping together of such collections; and the direction of regular change are attributes to be generated in the music emanating from processes on the second level of the composition section.

Descriptions of implemented level 2 composition processes follow:


This process is used to establish and break level-1 connections between the density features and certain transformations. It will be invoked by whichever level-2 listener message is connected to it; then the process examines the featurespace classification to determine if the current input is a chord or single note. If it is a chord, a level-1 connection will be drawn between the chord feature and the arpeggiation transformation. If the input is a single note, a level-1 connection will be drawn between the line feature and the chord transformation. In other words, the density of the input to the listener will be reversed in the Cypher composer’s output.


This method disconnects all level-1 methods from the vertical density features, so that no transformations will be invoked from the line or chord classifications. This method can be used, for example, to cancel the effect of VaryDensity. Regular behavior of some feature could be used to call VaryDensity, establishing density-activated transformations on level 1. Irregular behavior of the same feature might be tied to UndoDensity, causing all level-1 density-activated transformations to cease.


Primarily used as a way to mark phrase boundaries. When invoked, the routine sends out a loud MIDI event and adds a significant delay to the offset of an event block about to be played, emphasizing a potential phrase boundary. The method is designed to be called by the Phrase method on level 2; that is, whenever the phrase agency detects a phrase boundary, a message identifying the boundary will be sent out. If Phrase is hooked up to that message, it will announce the arrival of the boundary with an audible bang.


This method sends out a single event, containing one pitch placed randomly anywhere in the keyboard range, every seventh time it is invoked. All other invocations will have no effect. JigglePitch is most often used for “jiggling” composition methods that may be stuck in a repetitive pitch pattern or area and so can be profitably connected to level-2 listener messages denoting regularity.


MakeBass establishes a connection on level 1 between the all message and the basser transformation module. Connecting a player process to the all message means that the player process will be called for every input event, no matter what its features. The basser module queries the chord agency for the root of the current harmonic area, then plays out a single event, containing the root pitch, in the octave two below middle C.


The all message on level 1 is disconnected from the basser transformation module. BreakBass acts as the complement to MakeBass, undoing the effect of that module. For this reason, the basser module is only disconnected from all; any other level-1 features linked to basser will remain in force.


This method looks at the period of the current beat theory and schedules bass notes to be played on the beat. If input is arriving, the beat period will be constantly adjusted to match the most recent theory. If no more external information is arriving, bass notes will be continued, using the period from the most recently calculated beat theory.


The complement to BeatPlay, this method will stop any currently scheduled beat task from continuing. Again, these methods are paired and are often hooked to opposite poles of irregular/regular messages for some feature. For example, a report of regular speed could trigger the beat player, and a subsequent report of irregular speed could be attached to BeatStop, causing all beat activity from the player to stop.


This is the first of a series of mutator processes, which take advantage of mutation facilities built into the transformation modules. AccelMess sends a mutate message to the accelerator level-1 module to vary the rate of acceleration performed. The new values sent will follow the form of a ramp: they will increase linearly to a maximum of around 450 (which will be the number of milliseconds subtracted from event offsets by the accelerator), jump back down to 100, and increase slowly from there again.


SawMutate is another mutation process, this time sending messages to the level-1 sawer module. Sawer will change the width of sawtooth-like ornamentations in response to mutate messages. SawMutate changes this width randomly, up to a maximum of two octaves.


One easily detected, but nonetheless crucial, condition for the program to notice is when the music coming from the outside world has stopped. Among the possible causes of such a condition are that the
piece is over, players are stopping a rehearsal to discuss the performance, the building is on fire, or that it is time for the computer to take a solo. Cypher is an enthusiastic soloist: it always takes the absence of other input as a signal that it should play more.

Since the most common generation method for the composition section involves transforming received input, the absence of any input would seem a crippling limitation. When performing alone, Cypher’s composer continues generation by transformation through the simple expedient of transforming its own output. This method of soloistic generation I call composition by introspection. Music is generated from a controlled feedback loop: the player produces some output, which is analyzed and characterized by a listener. The listener sends messages to the player about what it has heard (in other words, what the player just produced), and the player is instructed, through the connections between listener messages and composition methods, to transform its own previous output in some way. The feedback path from the player to the listener is indicated with a heavy black line in figure 6.16.

Figure 6.16

Figure 6.16

Introspection is a way for the user to control the connection of listener messages to composition methods. This facility provides another type of interaction with the program, allowing a human director, or a script of connection sets, to regulate the performance of Cypher during computer solos. Applying the transformations to their own output turns out to be an interesting way to observe their effect. Feeding back on themselves, many transformations lead to registral, temporal, or dynamic extremes, where they will remain until something disrupts the state. Using the connection mechanism to reorder the modules called by different configurations of the featurespace provides just such a disruption. Another approach is to mutate the low-level transformations when level-2 analysis finds features behaving regularly. Pinned behaviors are flagged as regular, and a subsequent mutation of the transformations, ordered by the level-2 player, sends output off in another direction.

The Critic

Any number of distinct player streams can be assigned their own Cypher listener. A listener is called with a new event and a pointer to some stream’s analysis history. A player stream can emanate from any MIDI source, such as that coming from a human performer, from another computer program, or from Cypher’s own player. Cypher’s current architecture maintains two listeners, each with its own history. One listener is tuned to MIDI input arriving from the outside world; the other is constantly monitoring the output of the composition section.

We have just reviewed in some detail the process of analyzing a MIDI source, sending messages to the composition section, and producing novel musical material in response. The second listener shown in
figure 6.16 represents a compositional “critic,” another level of observation and compositional change, designed to monitor the output of the Cypher player and apply modifications to the material generated
by the composition methods before they are actually sent to the sound-making devices.

Interactive music systems typically have no way of evaluating their output — the composer/user of such systems steers the program in one direction or another without any way of measuring success other than by ear. One reason the problem has been so neglected is that it is treacherous; evaluating musical output can look like an arbitrary attempt to codify taste. I choose to look at it another way: developing a computer program with a capacity for aesthetic decision making, though it is certainly arbitrary, is interesting in its own right and is valuable for the quality of information that becomes available to the human developer in further evolving the program.

The information provided by the “critic” listener forms the foundation of a rule set governing the featural attributes, grouping, and regularity in the output. The critic functions as a production system: a set of rules controls which changes will be made to a block of musical material exhibiting certain combinations of attributes. Tracing the analysis produced by the critic listener and the changes introduced by the production system, an informed evaluation can be made both of the music being produced by the composition section and of the effectiveness and stylistic traits of the critic.

When a MIDI event has been analyzed by the listener, it is copied into a scratch event block for processing by the composition section. According to the connections established between listener agents and composition methods, events in the block will be modified, or new events will be added. It is possible, particularly during introspection or in communication with another computer program, that events will arrive more quickly than they can be processed. Even more common is the case that the program will generate more data than the sound synthesis gear can produce. For this reason, the program will sample input that is too dense to be treated successfully.

An important function of the critic is to find “interesting” material in the output of the player for further manipulation by the composition methods. When performing introspectively, Cypher often must sample material generated by the player. When tracking a human performer, the program is almost always able to keep up with the rate of events arriving for evaluation and response. When the program begins to perform through introspection, analyzing and transforming its own output, however, data can easily begin to arrive with so much density that sending out transformations of it all would swamp both the synthesis gear and the comprehension of the listener. (The same observation holds for any input with a high enough density of events; introspective composition is the most common case, but others, such
as input arriving from another computer program, are handled in the same way.) Therefore, the critic needs to select events from the wealth of material emanating from the player, which are then analyzed and used to germinate a fresh round of compositional processing.

To select interesting events for further processing, the critic uses the phrase boundary messages coming from the “critic” listener. A listener will identify phrase boundaries in a musical stream by monitoring discontinuities across feature classifications between two events. When discontinuities have arisen in enough features, the phrase agency reports a boundary. The critic listener performs this operation on material coming from the player. Events that the listener identifies as beginning a new phrase are considered “interesting” by the critic and are sent back to the “analysis” listener for another round of evaluation and response.

It may well happen that no phrase boundary is noticed in a block of material. In that case, the critic will select the event with the most differences from its predecessor. Essentially the same criterion is being used: events with many discontinuities relative to the previous event will be marked as interesting. If several events have the same level of difference from their predecessor, the earliest such event will be used.

Already in such a simple application of the critic, we can see that the rules associated with it form, in effect, an aesthetic bias for the program. Declaring events whose feature classifications differ markedly from their predecessor to be “interesting” is no more or less than an aesthetic choice; there is no inherent reason to find any musical event more interesting than any other. Aesthetic biases can be codified in terms of preferred actions to be taken in response to material exhibiting certain kinds of regularity, or feature classifications, or harmonic behavior-in short, preferences can be established for all of the situations reported out of the listener.

Aesthetic Productions

The rule set associated with the critic forms just such a collection of aesthetic preferences, applied to the incipient output of the player. The rules are expressed in the form of a production system: a set of condition-action pairs, where the condition parts are expressed in terms of logical operations on listener messages and the action parts consist of transformations to be applied to the musical material. The productions are applied just before the material is sent to the synthesizers. When events are first scheduled, there is no way of knowing how much additional material will be scheduled to be played at the same time. For that reason, whenever the composition section schedules some new output, it is first sent to a routine called Gather(), which will consolidate events destined for the same point in time. Further, Gather() will arrange for the consolidated material to be sent through the critic one clock tick before the time comes for it to be played.

Here is a simple example: the critic will prefer to reduce the vertical density of material being presented at a high speed (high horizontal density). Such a rule can be expressed as

if (FastVal(featurespace) > 2) Thinner( … ).

The critic first computes the featurespace and regularity words for the event. A set of routines such as FastVal() is available to examine the classifications being returned for any of the level-1 or level-2 features and regularities. In this case, FastVal() will return the value returned by the level-1 analysis for the current event. If this value is greater than 2, the speed of this event has reached the highest possible rate. In that case, the material will be reduced through application of the transformation module thinner.

The aesthetic productions held in the critic are now static; that is, Cypher has a particular style that it will enforce on the music it plays. In a full implementation, the critic’s rule set should be swapped in and out as a function of the style of music being played to the system. Rather than having a simple set of rules applying to the output of the program generally, specific stylistic rule sets could be invoked for known genres arriving from the outside world.

An extension to the static critic implements this idea: a further set of production rules can be invoked from the interface, or through a command in a score-orientation script. When the additional rules are used, several changes are made to the responses stored in the connections manager according to the characteristics of the music being put out by the composition section. This is another way to approach the manipulation of the player: rather than editing the program’s output directly, the way the output will be generated in the first place is changed. Possible manipulations programmed into the action part of these productions include making and breaking level-1 connections, changing the sound banks and timbres used, and reducing the length of time the program will wait before beginning to perform introspectively.

Combination Techniques and the Solo Method

In the preceding review of composition methods, I have roughly divided the algorithms presented into three classes: processes of transformation, of generation, or of sequencing. Another possibility, however, is to construct compositional strategies that combine aspects of all three classes; human improvisation certainly arises from a combination of these. The methods employed most by humans and Cypher are nearly inverted, in that the technique which is most difficult for humans (immediately adopting and transforming the material of others) is what Cypher does best and most often. Human improvisers rely much more heavily on remembered sequences, which can be called up and adapted to different performance situations, and on the manipulation of sets of basic elements (scales, rhythmic patterns). In this section, I will first describe an implemented composition method that incorporates elements of all three algorithmic styles. Then I will review techniques and motivations for building composite players out of several compositional agents.

The solo method is available as a level-1 module; however, the algorithm it implements is not a transformation of material presented at the input, as are the other level-1 modules. The only relations solo‘s output bear to the events in the input block are that it is harmonically related, and its horizontal density in time will tend to increase and decrease along with the density of the input material. In that respect, solo‘s behavior is related to the transformative class of algorithms we have already extensively considered; it is not, however, strictly a transformation, since the input events themselves remain unchanged. Their only effect is to guide the generation of the module, through their harmonic content and horizontal density.

Solo is a hybrid of the transformative and generative styles: we have already seen the sense in which it is transformative. In fact, that part of the algorithm might more properly be termed simply responsive, since it looks at the input, but does not change it. The output of the module is a monophonic melody; the part of the algorithm that produces pitches generates them from an array of interval preferences, matched against reports coming from the chord agency. This part of solo‘s operation is more generative, the second algorithmic style included in the module’s hybrid form.

Each time it is called, solo adds between zero and four new events to each event in the input block. The offset between events will be chosen at random between 20 and 50 centiseconds. An array of intervals helps determine the pitches of the new events, through the following calculation:

next = (WhichChord(DUMP, 0)/2)+72+melody[rand()%5l;.

First, the root of the current harmonic area is determined by querying the chord agency. The answer returned is divided by 2, to discount the mode of the harmony. Then, the root is added to 72, to place the activity of the solo module at least two octaves above middle C. Finally, one of the intervals from the melody array is chosen at random and added to the other elements. If the pitch calculated for any event is the same as the pitch of the event before, it is shifted down one half step.

The offsets for all the new events are added together and recorded. If the solo module is called again before all of the events from the most recent invocation have been performed, the module returns without producing any new events. It is this part of the algorithm that tends to match the horizontal density of solo to the density of the activity around it. Output from the module will never overlap itself; all of the events from one invocation are guaranteed to finish before the next group begins. If the input events are quite sparse, some time will pass before a new event spurs fresh output from solo. For inputs beyond a certain density level, however, solo’s output will be more or less continuous; a new event triggering more response will present itself quite quickly after any given group has finished playing.