Representing timbre with a spectrogram
Presented with the sounds of a flute and a guitar playing the same pitch at the same volume, it’s easy to tell which is which; but it’s not so easy to describe the difference in any precise way. Most attempts to do so would come down to describing how each sound is produced (e.g., by blowing or twanging), which is not the same thing as describing the sound itself. Similarly, it’s not easy to notate such differences except by indicating how the sounds are made, and in most notation systems this aspect is largely limited to specifying the sound sources.
However, a given sound source doesn’t always produce the same kind of sound. The string of a violin can be bowed, plucked, or hit with the wood of the bow, and each technique results in a radically different sound. In staff notation, these playing techniques are indicated with verbal descriptions in Italian: arco, pizzicato, and col legno respectively.
That’s generally sufficient for classical music, where there is a certain accepted ideal for the sound of (for instance) a string being bowed, and within that category the character of the sound is not supposed to vary much. The same goes for other instruments in classical music, and for classical singers.
In a lot of other music, though, variation in the character of the sounds from a single source is absolutely vital. A successful pop or rock singer not only has a unique and instantly recognisable voice, but is able to vary the tone of that voice from moment to moment to convey the emotion of a song with conviction.
For similar aesthetic reasons, instruments that take a solo role in popular music have tended to be ones that can vary their tone widely, such as the electric guitar and the saxophone. If we want to notate such music, it might be worth trying to specify these differences of tone in a more precise and efficient way than describing them in words.
The differences we are talking about are called differences of “timbre.” Sometimes glossed as “tone quality” or “tone color,” timbre is hard to define except in a negative way: it’s whatever aspect of sound is left when we take away pitch, volume, duration, and onset timing.
The challenge in trying to notate timbre is that it can’t easily be “measured” like those other parameters of sound. The pitch of a sound is more or less high; the volume is more or less loud; the duration is more or less long; and the onset timing is more or less late. But the timbre of a sound is more or less… what? Rough? Breathy? Scrapy? Resonant? It’s hard even to agree on meaningful adjectives, let alone imagine a scale by which we could measure those qualities and determine exactly how much more “rough” (or whatever) one sound is than another.
In other words, timbre is not a one-dimensional parameter. In practice, differences in timbre tend to involve three main things:
(1) Attack: How abrupt is the onset of the sound? Does the sound begin at full volume, or is there a momentary “fade-in"? The subsequent “envelope” of intensity may also play a role, e.g., through the rate of decay (for impulsive sounds) and the “release,” i.e., how each sound ends; but the attack is usually more crucial, simply because it comes first.
(2) Overtones: A sound of definite pitch normally comprises a number of different frequencies, primarily multiples of the “fundamental” frequency that we hear as the pitch of the sound. Sounds of different timbre differ in the relative strength of these “overtones,” as do different vowel sounds in speech. Sounds of definite pitch produced by solid objects, such as bells and xylophone keys, often involve overtones that are not multiples of the fundamental frequency, and this also affects their timbre.
(3) Noise: Sound of indefinite pitch may combine with either of the above, and may do so differently for the onset of a sound and for its subsequent “steady state.” The timbre of the harpsichord is distinguished in large part by its “noisy” percussive attack, while Japanese shakuhachi flute players often intentionally mix unpitched breath sounds with flute sounds of definite pitch.
To indicate all those things in a score sounds like a lot of trouble, and even then, the information about timbre might be liable to dominate the score to a degree that overshadows other aspects of the music; so it is probably only worth doing if timbre is a special interest. Still, it can be done, and in some cases it can capture important things about the music.
To notate timbre with any real precision will require the use of sound analysis software. Other pages explain how to use this for measuring onset timing and producing a pitch-time graph. What we now want is a graph that shows the pitch, not just of the fundamental frequencies, but of the overtones as well.
We also want the graph to show volume, or “intensity,” both overall (which will reveal the abruptness or otherwise of the attacks) and for individual overtones (since the balance of overtones partly determines timbre).
In addition, we want it to show the intensity of any sounds of indefinite pitch (since noise is also a component of timbre). In fact, we want a graph that shows whatever sound is present throughout the spectrum of pitch - a type of graph called a “spectrogram.”
We’ll explore the use of spectrograms by comparing different performances of the same song. In a chapter of the textbook Worlds of Music, ethnomusicologist Jeff Todd Titon contrasts two renditions of the well-known hymn “Amazing Grace” in American Baptist churches, one black and one white. Each has a lead singer and a chorus, and Titon describes the vocal timbres of the two lead singers. The black leader’s timbre, he says, “alternates between buttery smooth and raspy coarse,” while the white leader’s is “unvaryingly coarse.”
Titon’s staff notation of the two performances brings out many contrasts in their handling of pitch and timing, but doesn’t capture the difference in timbre or show (for instance) when the black leader’s timbre is “buttery smooth” and when it is “raspy coarse.” Yet this “alternation” is surely not random, but related to expression. Can we shed more light on it with a spectrogram?
We now get a picture of all the overtones and any other pitches present, aligned with the pitch-time graph of the melody. We may need to “Set Displayed Frequency Range” differently (through the “View” menu) to show all the pitches we want in a readily readable form.
We also get a waveform graph at the bottom, giving a rough idea of how intensity changes over time and thus how abrupt the attacks are. We can take a screenshot of all this and paste it into our graphics program to annotate it as we did before, for instance by showing the scale degrees and the syllables of the lyrics. The “black” version then comes out looking as follows. (This score also shows a solution to the problem of lyrics that go too fast to fit the time scale.)
The grey markings in the background that look like a charcoal pencil sketch indicate the overtones and other pitches that are present besides the pitches we consciously hear in the melody. At first there is just one prominent grey line running in parallel with the melody line: this is the second overtone, an octave above the fundamental pitch, and the absence of other pitches reflects the pure, “buttery smooth” timbre of the singer’s opening phrase.
Then, at “how sweet it sound,” there are more grey lines, but they still run in parallel, indicating a richer spectrum of overtones but not yet a “raspy coarse” timbre. That comes in the second half of the excerpt, in the places where the grey markings start to fill up the space and don’t form distinct parallel lines.
The smudgy grey shading indicates the presence of “noise,” sound in which we don’t perceive a definite pitch because too many different frequencies are mixed together. The “raspiness” that this produces seems to intensify the sense of affirmation implied by the use of the word “yeah” and the repetition of the beginning of the lyrics at a higher pitch level. In other words, the singer is indeed using variation of timbre as an expressive resource.
No beat or bar lines have been added to this graph because it’s far from clear where they would fall. In the “white” version, the rhythm is much more regular, and a meter of three beats per bar has been indicated in the score. (However, the computer graph reveals that the beats are not evenly spaced: the third beat of each bar is longer than the others.) The vocal timbre is visibly different from the “black” version too.
This time the melody line is paralleled by grey “charcoal” lines throughout, indicating multiple strong overtones. The relative strength of the grey lines varies - a result of the different vowel sounds in the lyrics, which, as mentioned, each have a different “balance” of overtones. The final syllable “me” shows only one distinct overtone above it, approaching the pure or “buttery smooth” timbre that began the black version, but this time in the context of a climactic high note.
Otherwise, the graph always shows at least three distinct overtones - sometimes many more - though there is none of the grey smudging that would indicate unpitched sound. The overtone-rich but noise-free spectrum is probably what led Titon to describe this singer’s timbre as “unvaryingly coarse” but not “raspy.”
For another contrast, and to emphasize that vocal timbre is a matter of culture and not genetics, here is the same song sung by black Canadian soprano Marie-Josée Lord. Again, the beats are not evenly spaced, but this time there is less consistency: the third beat of the bar is sometimes shorter and sometimes longer than the other beats. This may be related to the extremely slow tempo, which is right on the edge of our ability to feel a regular beat at all.
The “pure” tone cultivated by a classical singer results in very little “charcoal” in the spectrogram. Again, there is some variation between vowel sounds, but in general the second and third overtones - usually the most prominent in our other examples - are weak or absent, and it is the higher overtones that ring out. This, together with the consistent vibrato apparent from the waviness of the melody line, may help explain the classical singer’s ability to “project” the voice without amplification above the sound of a full orchestra.
Incorporating spectrograms into our notation has enabled us to indicate differences in timbre, both between different singers and between different sounds made by the same singer. The waveform graph can help distinguish timbres too, although in this case it suggests that none of the attacks are particularly abrupt and that differences in the overtone balance and noise component may be more significant for vocal music.
A similar approach can be used for instrumental music when variation of timbre seems important enough to specify.
If a simpler and “cleaner” score is wanted, timbre can also be specified in less detail and without relying on sound analysis software. With sounds of unspecified duration, for instance, distinct contrasts of timbre can be indicated by varying the shapes and positions of symbols (see Specifying other information where duration is unspecified). A solution along similar lines could no doubt be developed for sounds of specified duration and pitch, especially if we accept a resource that we have so far avoided: the use of multiple colors.
Sources of audio
“Amazing Grace” performed by New Bethel Baptist Church and by Fellowship Independent Baptist Church, field recordings by Jeff Todd Titon published in CD set accompanying Jeff Todd Titon, ed., Worlds of Music: An Introduction to the Music of the World’s Peoples, 3rd ed., New York: Schirmer Books, 1996, CD 1 tracks 20 and 21 (discussed on pp. 144–149).
“Amazing Grace” sung by Marie-Josée Lord on Marie-Josée Lord: Amazing Grace, Atma Classique CD, ASIN: B00LYJINAK, track 7.
Sources of software
Tony software has been developed at Queen Mary, University of London by the authors of the following article:
M. Mauch, C. Cannam, R. Bittner, G. Fazekas, J. Salamon, J. Dai, J. Bello and S. Dixon, “Computer-aided Melody Note Transcription Using the Tony Software: Accuracy and Efficiency,” in Proceedings of the First International Conference on Technologies for Music Notation and Representation, 2015. https://code.soundsoftware.ac.uk/attachments/download/1423/tony-paper_preprint.pdf (PDF, 319KB)