• No results found

Chapter 4 – Knowledge and Laws – Informed Machine Composition

5. Cypher: an overview

Cypher (C) as presented in Robert Rowe, “Interactive Music systems. Machine listening and composing”, (1993) and has become a paradigm system for MC. It has been used numerous times in teaching, theory and practice.

Components

Cypher has been called 'interactive computer music' system by its author because it falls between the categories of composition and performance. In Rowe's classification system C is a performance-driven, transformative and algorithmic player system. It consists of two main components: an analysis section or Listener (L) and a composition section and player section or performer (P). C does not work on stored material or scores. It analyzes free form input and generates output using algorithmic styles that define C's personality (see on meta-styles or aesthetics below).

Elements

C's components consist of features analyzed by its L and transformations that relate C's input to the output of P. L classifies on the first level [L1],349 the features of density, speed, loudness, register, duration and harmony, assigning each tone or sonic event a unique point in a six-dimensional feature space (L1), for instance [low, fast, very loud, legato,a minor].350 On the next level [L2], L looks for changes of these features over time (temporal feature functions).

On higher levels, C looks for phrase-length in input, regularities/irregularities of different lower level features. In a sense we might look at the L1-feature space as the range of functions for MIDI -notes, while higher level functions are first- and higher-order derivatives of these functions.

Resulting vectors or feature spaces are defined within restricted value ranges such as 'loud' and 'soft', using thresholds for classification. Scaling features down to a few values results in a rather coarse gradation, but leads also to fewer categories, i.e. clusters of points in the featurespace or cube.

“Further, higher listening levels will use the feature space abstraction to characterize the development of each feature's behavior over time”351

Using “fuzzification” of MIDI-data on every higher level, data resolution is reduced to practical sizes for processing.352

P is defined by its methods of response, e.g. initialization of algorithms under specified conditions (variables). Response methods are transformations to input notes such as transposition, delay, inversion and acceleration (see echo in ch3).

Users of C will function as “composers”, performers or both when they define and tune the values of the connections between features of various levels and their transformations and algorithms (a total set of these values is called state). This is practically done by drawing lines between L1 and L2 features and transformations. An important modus and component of C is its rehearsal mechanism that permits restoration of whole states saved in earlier sessions in C, i.e. memorizing successful

“takes” in repeated rounds of “playing around” with C. States are metadata of C that comprise variables for connections, sound choices etc. Sets of states are stored in performance files (storing sets of states). Successions of state changes, controlled by cues in input, may finally represent, not unlike scores, a complete performance of a work.

Architecture and inner fabric: hierarchies and progressions

Rowe is inspired by Minsky, Meyer and Narmour. In 'Society of minds', Minsky develops his theory of the Mind built around networks of partly autonomous agents. It is a non-hierarchical model of meaning and intention, that presupposes somehow parallel representations and perspectives353 taken by “freelance” agents that participate in a sort of “grand symphony”. This view is very much in line with Narmour's systemic implication-realization model, implemented as networks354 of musical events on various levels, ordered horizontally and compatible with listening expectations.

So are the intuitions of others, like Dewey and Meyer, that treated events in time-directed perspectives, ideas that Rowe implemented into Cypher:

Eugene Narmour's theory tends to assume goal-directed, expectation-based model of music cognition.355

Listener:

Rowe uses ideas coming out of many music theory strands. He treats low level objects for instance as collections instead of rigid structures. Higher levels “use the abstractions and results produced of lower levels” and the higher a level the longer will its structures span in time. C may be described as a nonuniform hierarchy with expectation-based dynamics. There is certainly less hierarchy than proposed by GTTM (above) and there is certainly no pendant to well-formedness and preference rules. About the progressive perspective Rowe writes:

The progressive perspective is adopted on other structures as well; harmonic progressions, patterns of rhythm or melody, and higher level groups of Cypher events all are related, at times, by the operations of succession and precedence... All events are connected in a hierarchy and simultaneously tied together in relations of succession and precedence... Many prominent music theories devise a single structural perspective within which to describe musical behavior...356

This multi-structural perspectivism is also reflected in Rowe's liberal or eclectic use of

methodology. In harmonic and rhythmic analysis he uses connectionist-like principles from the

ANN approach. To find the root and mode of a section of piece, twelve input nodes (for each semi-tone in MIDI) update 24 Chord theories (12 major and 12 minor) with either negative or positive increments based on above mentioned tonal principles, adjusted by Rowe, through trials and errors.357 Similarly, a C agent will update the scores of 24 Key theories on higher levels, with negative increments or weights contributing more than positive key weights.358 Winning theories of chords and keys are associated with confidence values that measure the strength of the winning theory relative to the total strength of the surroundings scores.

Like musical theory, C uses information from other agents (density, register, beat a.o.) to improve the results of the analysis. e.g. density agents may inform a chord agent about the sufficiency of constituent chord parts, or register agents may help chord agents to assign greater weight to the lowest tone in chords. Then there is the contribution of beat agents in weighting events on the beat by factor of 1.1. And the interdependency of chord and key classification, above characterized as a vicious cycle is handled in C with information fed back from key agents to chord agents, similar to what musical theory prescribes.

Rhythm analysis is called beat-tracking359 in C. Beat agents find the lowest level of Cooper/Meyers classic three-level analysis360 of hierarchic rhythmic activity: pulse (regularly recurring succession of undifferentiated events). Starting out with no knowledge about the expected beat pulse, C needs a dynamic approach to succeed in real time for interactive use. Again C is somehow related to

connectionist models, maintaining simultaneously many theories of possible periodicities in parallel.

In the multiple theory algorithm, separate theories are maintained for all possible centisecond offsets within this range [288 to 1500 milliseconds]; in other words, offsets from 28 to 150 centiseconds (a total of 123 possibilities) are regarded as possible beat periods. (page 144) ...

The first thing the tracker does is to examine the expected event arrival times of all theories. If the real arrival coincides with an expected arrival for any nonzero theory ... , points are added to that theory's score. ... If the real offset arrives later than an expected offset, points are subtracted from that theory's score. The heuristics here is that syncopations are unlikely; that is true beat pulses will usually have events aligned with them. 361

Again thresholds fuzzify incoming data into manageable resolution. Then, the problem of “false negatives” or syncopations is handled by a 'syncopation heuristic part' of the algorithm that memorizes and updates five most likely interpretations of incoming event placements relative to a factorized patterns. The underlying logic of this agent is that geometric subdivisions or multiples (of inversely divisions) of 'two' and 'three' span networks with “good” places (having simple integer ratios) that we may use for predictions or 'theories' that ultimately are rewarded and penalized according to the real 'arrivals' or evidence. Aside from handling syncopation as natural exceptions a beat tracker must also catch the larger scale time deviations or tempo fluctuations carrying

expressive intentions. Many smaller scale fluctuations parallel bodily dynamical contours or shapes.

Tempo fluctuations are tracked using continual factorizational adaptations362. C does though not detect meter or higher level rhythmic organization.

Both vertical key- and horizontal beat-analysis can be informed by knowledge about how events are segmented into sequences. Grouping of events is in C done by a phrase boundary agency that uses discontinuities between classified features as criterium:363

In Cypher, phrases are musical sequences, commonly from around two to ten seconds in duration, that cohere due to related harmonic, rhythmic, and textural behaviors. The level-2 listener detects boundaries between phrases by looking for discontinuities in the output of the level-1 feature agents. Each agent is given a different weight in affecting the determination of a phrase boundary; discontinuities in timing, for instance, contribute more heavily than differences in dynamic. The phrase boundary agent collects information from all the perceptual features, plus the chord, key, and beat agencies. When a discontinuity is noticed in the output of a feature agent, the weight for that feature is summed with whatever other discontinuities are present for the same event. When the sum of these weights surpasses a threshold, the phrase agent signals a group boundary.

Composer and player:

The player component outputs what the composition section has transformed and generated. A configuration of C tells the player how to respond to the results of analyses of input to the listener.

The response methods are the agents of the composer in C and they operate at different levels, similar to listening agents. Configuring C means establishing and defining connections between features of L1..Ln and the various generative and especially transformative methods of response.

Transformation is done in C by chaining many small response modules or objects, very much like listening was accomplished by small interacting agents. They can be reused several times in loop or later in the chain. Such object chains result in rather complex but deterministic output to the player.

The behavior of response objects can be tuned with arguments that vary their precise

transformational algorithms.364 There is a clear and probably intended similarity of response objects in C with objects in MAX. Row uses MAX patches in his two books to illustrate the structures of transformations and generations implemented in the C language for Cypher. These are the Level-1 objects that transform and generate material:

Accelerator Gracer Sawer

Accenter Harmonizer Solo

Arpegiator Inverter Stretcher

Backward Looper Swinger

Basser Louder Thinner

Chorder Obbligato TightenUp

Decelerator Ornamenter Transposer

Flattener Phraser Tremolizer

Glisser Quieter Triller

These objects start to generate output when they receive the message continue. The process is then repeated with increasing/decreasing values for each call until a limit or its defined overall duration is reached. Second-level methods, somehow analogical to the second level in listening, settle the degrees of direction and regularities in groups of events or sequences. Methods of level-2 are usually triggered by messages from level-2 listeners and operate on the level of phrases. But level-2 objects also autonomously (i.e. without the interference of the user) establish, change (mutate) and break connections between L1-features and L1-transformations (example365).

The level 2 composition objects are :

VaryDensity MakeBass AccMutate

UndoDensity BreakBass SawMutate

Phrase BeatPlay

JigglePitch BeatStop

Meta Listener: Critic

C typically interacts with a human player, i.e. listens to human input interpreted with built-in preferences and defined connections. This chamber-musical personality switches to soloist

primadonna behavior when its own transformations or input stops: C autogenerates novel output in coherence with previous output of himself. Rowe calls this “composition by introspection”. It means that C can reflect on its own compositional output. It can also be influenced by a human

“director” that regulates the solo-performances. In such cases it reminds of M. A critic that listens to

the players output and changes the configuration for solo-performances on the fly, works like a controlled feedback loop. Rowe illustrates:

Feeding back on themselves, many transformations lead to registral, temporal, or dynamic extremes, where they will remain until something disrupts the state. Using the connection mechanism to reorder the modules called by different configurations of the featurespace provides just such a disruption. Another approach is to mutate the low-level transformations when level-2 analysis finds features behaving regularly. Pinned behaviors are flagged as regular, and a subsequent mutation of the transformations, ordered by the level-2 player, sends output off in another direction.366

4.6 Other informed MC systems

Cypher has been reviewed as some sort of prototype of MC systems. We now widen the perspective with presentations of other informed MC systems that share fundamental structure and outlook. In

ch3 we saw how transformative objects in MAX could produce complex structured output. Both delay and transpose are echoed in L1-methods of C, but were not implemented in ch3's hierarchical and systemic integration with analysis at listening and critic agent levels with interactive and self-referential meaning as they are found in C. Neither M nor JamFactory were informed in a sense of musicianship capturing and manifesting musical concepts. M is highly interactive, just as C, and it lets the user configure massively algorithmic interfaces that control rather abstract featurespaces of sound containing moderate musical meaning. The algorithmic expressivity of MAX on the other hand aloud the construction of musically informed patches. A well-known strategy is to employ data structures of MAX, such as tables, to define musical forms as Rowe demonstrates with Guido's method367 for composing “medieval” chant and isorhythms.368

Music Theater (MT) in 'Cybernetic Music'

In a book titled 'Cybernetic music'369 Jaxitron (pseudonym) presents his computer-aided

composition (CAC) system, called 'Harmonization/Melodization Workspace' or 'Music Theater' (MT). He uses 'cybernetic' (as governing or steering information) to express his intention to

overcome the limitations known in the 'computer music' culture. Jaxitron's system is programmed in the non-standard programming language of APL.370 The computerized music generation system is designed to automate composition routine tasks on computers applying a “logic of music”.371 Of less practical importance today with its non-ASCII-characters in syntax, MT serves nonetheless as illustration of composition-assisting systems of MC in the days before MIDI and MAX. The author represents data numerically from very low to high level, with arrays and vectors. Higher level objects are encoded as lists. The system generates limited sound output only, results are written out as lists of notes in a rudimentary and non-standard notational format. Even so, his system has many structural levels inspired by and modeled after the intricate relations between different musical dimensions (harmonic, melodic and rhythmic, formal). A number of rules of thumb about e.g.

reasonable voicing in polyphonic composition are implemented together with other constraint rules.

At higher levels of musical representation concepts of order and freedom (different from random), tension, climax, distortions and tonality are applied.

A session in MT starts with setting global variables, choosing of functions that operate on 'harmonic alternatives' (HA array), 'target melody' (TM vector) and 'utilitarian purposes'. The composer

/programmer then selects thematic material (example372) to be transformed and developed according to the system's arguments and functions, alterable even in operational mode (during execution).

Jaxitron's model is heavily influenced by the today less influential Joseph Schillinger. Schillinger's system for composition (1946) may be seen as an evolutionary step backwards from musical serialism, even though Jaxitron used numbering formulas for music representation, applied

mathematical laws of proportions and notions like symmetry, regularity and coordination. A theory of rhythm “extended to include “space” in a totally abstract sense”.373 But even though

combinatorial and statistical aspects enter the presentation of MT, Jaxitron and Schillinger

implement a more traditional model of rhythm, tonality and melody, unlike many of the high-level and abstract models of serialism and computer music. Jaxitron uses it

To attack the cybernetic problem I must ask, “How do I choose rhythms of duration?” This leads us to further questions about the internal “feedback mechanism that keeps me “on track”, accepting certain patterns and rejecting others. Eventually, the problem distills to,”What makes sense rhythmically, and why?” 374

...

A method that could take us too far afield uses computer graphics and what Schillinger called 'melodic trajectories' and the axes of melody in pitch/time space. Briefly, this involves the construction of 'curves' that will guide the melody... Each sampling then provides a point whose pitch coordinate needs to be adjusted the the nearest tone in the harmonic structure that is being melodized at that point in time. In principle it would seem that formal structure in the melody could be ensured be selecting and manipulating geometric shapes that repeat, reflect, expand, contract, and meet end-to-end through translation.375

Applying statistical arguments to pitch systems, Jaxitron is able to measure and quantify phenomena like tension where the total tension of a chord is derived from the sum of the tensions of all intervals in the chord. He splits this continuous dimensions into four tension-classes (range from less than 10, described as simple or “blah” to more than 1000, described as complex or “oops”).

Combining the various parts and operations, Jaxitron speculates about stylized accompaniment possibilities and its representational questions, somehow foreshadowing the today ubiquitous Band-in-a-Box software.376 Jaxitron concludes his exposition of often cryptic APL-code with a

“Cybernetic song book” that exposes results from MT in the form of handwritten traditional scores.

This helps to support his purpose to demonstrate that “computer music” in his “musical cybernetic system” does not need to speak with heavy technological accents.377

Jaxitrons MT makes, I believe, a strong case for the assumption that MC systems can be powerful without too much detail and veracity to the cognitive processes. It was constructed in a strictly bottom-up approach with moderate theoretical weight, almost exclusively building on the

fundamental theories of Schillinger. Nonetheless he takes care of countless “cognitive facts” and traditional wisdoms while informally expressing them in MT-functions and operations. His

sensitivity to constraints and relations between vertical, horizontal and geometrical aspects, as well as his hierarchic and quantifying methods may qualify his efforts to be be part of the AI-project of capturing musical meaning and not structure alone. Even though his formulations in APL seem rather unmusical in style378, his results seem convincing enough. It may stand as an example of very weak AI, i.e. keeping high distance between conceptual representation and the represented actual structure in composing brains. We may characterize MT as a weakly interactive and highly generative score-driven composer system without listening functionality. It is therefore best described as an composition assisting system (or CAC) with very low (limited to some degree of randomness) autonomy. While it certainly is no player in Rowe's sense, it might be looked at as an composition-oriented instrument.

CHORALE: Expert system for generating Bach-style polyphony

Chorale is a widely respected approach using condition-action pairs (generate section) and generate-and-test method (test section) together with a heuristic section, constraints (negative weights) and recommendations (positive weights). The author, Kemal Ebcioglu, managed to inform Chorale with rules that map successfully to compositional practice in a specific and prolific musical style.

Resulting chorals of Chorale repeatedly impress audiences with its successful emulations of Bach-style polyphonic repertoir. Often the weakest chain in expert system production is the knowledge acquisition or transformation of informal into formal knowledge. In this case though, centuries of teaching of baroque-style polyphonic techniques created a sort of canonical knowledge that only

awaited for to become translated into computational, in this case rule-based, representation.

awaited for to become translated into computational, in this case rule-based, representation.