Edward Sapir (1884–1939). Language: An Introduction to the Study of Speech. 1921.

Chapter 3

The Sounds of Language

WE have seen that the mere phonetic framework of speech does not constitute the inner fact of language and that the single sound of articulated speech is not, as such, a linguistic element at all. For all that, speech is so inevitably bound up with sounds and their articulation that we can hardly avoid giving the subject of phonetics some general consideration. Experience has shown that neither the purely formal aspects of a language nor the course of its history can be fully understood without reference to the sounds in which this form and this history are embodied. A detailed survey of phonetics would be both too technical for the general reader and too loosely related to our main theme to warrant the needed space, but we can well afford to consider a few outstanding facts and ideas connected with the sounds of language.

The feeling that the average speaker has of his language is that it is built up, acoustically speaking, of a comparatively small number of distinct sounds, each of which is rather accurately provided for in the current alphabet by one letter or, in a few cases, by two or more alternative letters. As for the languages of foreigners, he generally feels that, aside from a few striking differences that cannot escape even the uncritical ear, the sounds they use are the same as those he is familiar with but that there is a mysterious “accent” to these foreign languages, a certain unanalyzed phonetic character, apart from the sounds as such, that gives them their air of strangeness. This naïve feeling is largely illusory on both scores. Phonetic analysis convinces one that the number of clearly distinguishable sounds and nuances of sounds that are habitually employed by the speakers of a language is far greater than they themselves recognize. Probably not one English speaker out of a hundred has the remotest idea that the t of a word like sting is not at all the same sound as the t of teem, the latter t having a fullness of “breath release” that is inhibited in the former case by the preceding s; that the ea of meat is of perceptibly shorter duration than the ea of mead; or that the final s of a word like heads is not the full, buzzing z sound of the s in such a word as please. It is the frequent failure of foreigners, who have acquired a practical mastery of English and who have eliminated all the cruder phonetic shortcomings of their less careful brethren, to observe such minor distinctions that helps to give their English pronunciation the curiously elusive “accent” that we all vaguely feel. We do not diagnose the “accent” as the total acoustic effect produced by a series of slight but specific phonetic errors for the very good reason that we have never made clear to ourselves our own phonetic stock in trade. If two languages taken at random, say English and Russian, are compared as to their phonetic systems, we are more apt than not to find that very few of the phonetic elements of the one find an exact analogue in the other. Thus, the t of a Russian word like tam “there” is neither the English t of sting nor the English t of teem. It differs from both in its “dental” articulation, in other words, in being produced by contact of the tip of the tongue with the upper teeth, not, as in English, by contact of the tongue back of the tip with the gum ridge above the teeth; moreover, it differs from the t of teem also in the absence of a marked “breath release” before the following vowel is attached, so that its acoustic effect is of a more precise, “metallic” nature than in English. Again, the English l is unknown in Russian, which possesses, on the other hand, two distinct l-sounds that the normal English speaker would find it difficult exactly to reproduce—a “hollow,” guttural-like l and a “soft,” palatalized l-sound that is only very approximately rendered, in English terms, as ly. Even so simple and, one would imagine, so invariable a sound as m differs in the two languages. In a Russian word like most “bridge” the m is not the same as the m of the English word most; the lips are more fully rounded during its articulation, so that it makes a heavier, more resonant impression on the ear. The vowels, needless to say, differ completely in English and Russian, hardly any two of them being quite the same.

I have gone into these illustrative details, which are of little or no specific interest for us, merely in order to provide something of an experimental basis to convince ourselves of the tremendous variability of speech sounds. Yet a complete inventory of the acoustic resources of all the European languages, the languages nearer home, while unexpectedly large, would still fall far short of conveying a just idea of the true range of human articulation. In many of the languages of Asia, Africa, and aboriginal America there are whole classes of sounds that most of us have no knowledge of. They are not necessarily more difficult of enunciation than sounds more familiar to our ears; they merely involve such muscular adjustments of the organs of speech as we have never habituated ourselves to. It may be safely said that the total number of possible sounds is greatly in excess of those actually in use. Indeed, an experienced phonetician should have no difficulty in inventing sounds that are unknown to objective investigation. One reason why we find it difficult to believe that the range of possible speech sounds is indefinitely large is our habit of conceiving the sound as a simple, unanalyzable impression instead of as the resultant of a number of distinct muscular adjustments that take place simultaneously. A slight change in any one of these adjustments gives us a new sound which is akin to the old one, because of the continuance of the other adjustments, but which is acoustically distinct from it, so sensitive has the human ear become to the nuanced play of the vocal mechanism. Another reason for our lack of phonetic imagination is the fact that, while our ear is delicately responsive to the sounds of speech, the muscles of our speech organs have early in life become exclusively accustomed to the particular adjustments and systems of adjustment that are required to produce the traditional sounds of the language. All or nearly all other adjustments have become permanently inhibited, whether through inexperience or through gradual elimination. Of course the power to produce these inhibited adjustments is not entirely lost, but the extreme difficulty we experience in learning the new sounds of foreign languages is sufficient evidence of the strange rigidity that has set in for most people in the voluntary control of the speech organs. The point may be brought home by contrasting the comparative lack of freedom of voluntary speech movements with the all but perfect freedom of voluntary gesture. Our rigidity in articulation is the price we have had to pay for easy mastery of a highly necessary symbolism. One cannot be both splendidly free in the random choice of movements and selective with deadly certainty.

There are, then, an indefinitely large number of articulated sounds available for the mechanics of speech; any given language makes use of an explicit, rigidly economical selection of these rich resources; and each of the many possible sounds of speech is conditioned by a number of independent muscular adjustments that work together simultaneously towards its production. A full account of the activity of each of the organs of speech—in so far as its activity has a bearing on language—is impossible here, nor can we concern ourselves in a systematic way with the classification of sounds on the basis of their mechanics. A few bold outlines are all that we can attempt. The organs of speech are the lungs and bronchial tubes; the throat, particularly that part of it which is known as the larynx or, in popular parlance, the “Adam’s apple”; the nose; the uvula, which is the soft, pointed, and easily movable organ that depends from the rear of the palate; the palate, which is divided into a posterior, movable “soft palate” or velum and a “hard palate”; the tongue; the teeth; and the lips. The palate, lower palate, tongue, teeth, and lips may be looked upon as a combined resonance chamber, whose constantly varying shape, chiefly due to the extreme mobility of the tongue, is the main factor in giving the outgoing breath its precise quality of sound.

The lungs and bronchial tubes are organs of speech only in so far as they supply and conduct the current of outgoing air without which audible articulation is impossible. They are not responsible for any specific sound or acoustic feature of sounds except, possibly, accent or stress. It may be that differences of stress are due to slight differences in the contracting force of the lung muscles, but even this influence of the lungs is denied by some students, who explain the fluctuations of stress that do so much to color speech by reference to the more delicate activity of the glottal cords. These glottal cords are two small, nearly horizontal, and highly sensitive membranes within the larynx, which consists, for the most part, of two large and several smaller cartilages and of a number of small muscles that control the action of the cords.

The cords, which are attached to the cartilages, are to the human speech organs what the two vibrating reeds are to a clarinet or the strings to a violin. They are capable of at least three distinct types of movement, each of which is of the greatest importance for speech. They may be drawn towards or away from each other, they may vibrate like reeds or strings, and they may become lax or tense in the direction of their length. The last class of these movements allows the cords to vibrate at different “lengths” or degrees of tenseness and is responsible for the variations in pitch which are present not only in song but in the more elusive modulations of ordinary speech. The two other types of glottal action determine the nature of the voice, “voice” being a convenient term for breath as utilized in speech. If the cords are well apart, allowing the breath to escape in unmodified form, we have the condition technically known as “voicelessness.” All sounds produced under these circumstances are “voiceless” sounds. Such are the simple, unmodified breath as it passes into the mouth, which is, at least approximately, the same as the sound that we write h, also a large number of special articulations in the mouth chamber, like p and s. On the other hand, the glottal cords may be brought tight together, without vibrating. When this happens, the current of breath is checked for the time being. The slight choke or “arrested cough” that is thus made audible is not recognized in English as a definite sound but occurs nevertheless not infrequently. This momentary check, technically known as a “glottal stop,” is an integral element of speech in many languages, as Danish, Lettish, certain Chinese dialects, and nearly all American Indian languages. Between the two extremes of voicelessness, that of completely open breath and that of checked breath, lies the position of true voice. In this position the cords are close together, but not so tightly as to prevent the air from streaming through; the cords are set vibrating and a musical tone of varying pitch results. A tone so produced is known as a “voiced sound.” It may have an indefinite number of qualities according to the precise position of the upper organs of speech. Our vowels, nasals (such as m and n), and such sounds as b, z, and l are all voiced sounds. The most convenient test of a voiced sound is the possibility of pronouncing it on any given pitch, in other words, of singing on it. The voiced sounds are the most clearly audible elements of speech. As such they are the carriers of practically all significant differences in stress, pitch, and syllabification. The voiceless sounds are articulated noises that break up the stream of voice with fleeting moments of silence. Acoustically intermediate between the freely unvoiced and the voiced sounds are a number of other characteristic types of voicing, such as murmuring and whisper. These and still other types of voice are relatively unimportant in English and most other European languages, but there are languages in which they rise to some prominence in the normal flow of speech.

The nose is not an active organ of speech, but it is highly important as a resonance chamber. It may be disconnected from the mouth, which is the other great resonance chamber, by the lifting of the movable part of the soft palate so as to shut off the passage of the breath into the nasal cavity; or, if the soft palate is allowed to hang down freely and unobstructively, so that the breath passes into both the nose and the mouth, these make a combined resonance chamber. Such sounds as b and a (as in father) are voiced “oral” sounds, that is, the voiced breath does not receive a nasal resonance. As soon as the soft palate is lowered, however, and the nose added as a participating resonance chamber, the sounds b and a take on a peculiar “nasal” quality and become, respectively, m and the nasalized vowel written an in French (e.g., sang, tant). The only English sounds that normally receive a nasal resonance are m, n, and the ng sound of sing. Practically all sounds, however, may be nasalized, not only the vowels—nasalized vowels are common in all parts of the world—but such sounds as l or z. Voiceless nasals are perfectly possible. They occur, for instance, in Welsh and in quite a number of American Indian languages.

The organs that make up the oral resonance chamber may articulate in two ways. The breath, voiced or unvoiced, nasalized or unnasalized, may be allowed to pass through the mouth without being checked or impeded at any point; or it may be either momentarily checked or allowed to stream through a greatly narrowed passage with resulting air friction. There are also transitions between the two latter types of articulation. The unimpeded breath takes on a particular color or quality in accordance with the varying shape of the oral resonance chamber. This shape is chiefly determined by the position of the movable parts—the tongue and the lips. As the tongue is raised or lowered, retracted or brought forward, held tense or lax, and as the lips are pursed (“rounded”) in varying degree or allowed to keep their position of rest, a large number of distinct qualities result. These oral qualities are the vowels. In theory their number is infinite, in practice the ear can differentiate only a limited, yet a surprisingly large, number of resonance positions. Vowels, whether nasalized or not, are normally voiced sounds; in not a few languages, however, “voiceless vowels” also occur.

The remaining oral sounds are generally grouped together as “consonants.” In them the stream of breath is interfered with in some way, so that a lesser resonance results, and a sharper, more incisive quality of tone. There are four main types of articulation generally recognized within the consonantal group of sounds. The breath may be completely stopped for a moment at some definite point in the oral cavity. Sounds so produced, like t or d or p, are known as “stops” or “explosives.” Or the breath may be continuously obstructed through a narrow passage, not entirely checked. Examples of such “spirants” or “fricatives,” as they are called, are s and z and y. The third class of consonants, the “laterals,” are semi-stopped. There is a true stoppage at the central point of articulation, but the breath is allowed to escape through the two side passages or through one of them. Our English d, for instance, may be readily transformed into l, which has the voicing and the position of d, merely by depressing the sides of the sides of the tongue on either side of the point of contact sufficiently to allow the breath to come through. Laterals are possible in many distinct positions. They may be unvoiced (the Welsh ll is an example) as well as voiced. Finally, the stoppage of the breath may be rapidly intermittent; in other words, the active organ of contact—generally the point of the tongue, less often the uvula—may be made to vibrate against or near the point of contact. These sounds are the “trills” or “rolled consonants,” of which the normal English r is a none too typical example. They are well developed in many languages, however, generally in voiced form, sometimes, as in Welsh and Paiute, in unvoiced form as well.

The oral manner of articulation is naturally not sufficient to define a consonant. The place of articulation must also be considered. Contacts may be formed at a large number of points, from the root of the tongue to the lips. It is not necessary here to go at length into this somewhat complicated matter. The contact is either between the root of the tongue and the throat, some part of the tongue and a point on the palate (as in k or ch or l), some part of the tongue and the teeth (as in the English th of thick and then), the teeth and one of the lips (practically always the upper teeth and lower lip, as in f), or the two lips (as in p or English w). The tongue articulations are the most complicated of all, as the mobility of the tongue allows various points on its surface, say the tip, to articulate against a number of opposed points of contact. Hence arise many positions of articulation that we are not familiar with, such as the typical “dental” position of Russian or Italian t and d; or the “cerebral” position of Sanskrit and other languages of India, in which the tip of the tongue articulates against the hard palate. As there is no break at any point between the rims of the teeth back to the uvula nor from the tip of the tongue back to its root, it is evident that all the articulations that involve the tongue form a continuous organic (and acoustic) series. The positions grade into each other, but each language selects a limited number of clearly defined positions as characteristic of its consonantal system, ignoring transitional or extreme positions. Frequently a language allows a certain latitude in the fixing of the required position. This is true, for instance, of the English k-sound, which is articulated much further to the front in a word like kin than in cool. We ignore this difference, psychologically, as a non-essential, mechanical one. Another language might well recognize the difference, or only a slightly greater one, as significant, as paralleling the distinction in position between the k of kin and the t of tin.

The organic classification of speech sounds is a simple matter after what we have learned of their production. Any such sound may be put into its proper place by the appropriate answer to four main questions:—What is the position of the glottal cords during its articulation? Does the breath pass into the mouth alone or is it also allowed to stream into the nose? Does the breath pass freely through the mouth or is it impeded at some point and, if so, in what manner? What are the precise points of articulation in the mouth? This four-fold classification of sounds, worked out in all its detailed ramifications, is sufficient to account for all, or practically all, the sounds of language.

The phonetic habits of a given language are not exhaustively defined by stating that it makes use of such and such particular sounds out of the all but endless gamut that we have briefly surveyed. There remains the important question of the dynamics of these phonetic elements. Two languages may, theoretically, be built up of precisely the same series of consonants and vowels and yet produce utterly different acoustic effects. One of them may not recognize striking variations in the lengths or “quantities” of the phonetic elements, the other may note such variations most punctiliously (in probably the majority of languages long and short vowels are distinguished; in many, as in Italian or Swedish or Ojibwa, long consonants are recognized as distinct from short ones). Or the one, say English, may be very sensitive to relative stresses, while in the other, say French, stress is a very minor consideration. Or, again, the pitch differences which are inseparable from the actual practice of language may not affect the word as such, but, as in English, may be a more or less random or, at best, but a rhetorical phenomenon, while in other languages, as in Swedish, Lithuanian, Chinese, Siamese, and the majority of African languages, they may be more finely graduated and felt as integral characteristics of the words themselves. Varying methods of syllabifying are also responsible for noteworthy acoustic differences. Most important of all, perhaps, are the very different possibilities of combining the phonetic elements. Each language has its peculiarities. The ts combination, for instance, is found in both English and German, but in English it can only occur at the end of a word (as in hats), while it occurs freely in German as the psychological equivalent of a single sound (as in Zeit, Katze). Some languages allow of great heapings of consonants or of vocalic groups (diphthongs), in others no two consonants or no two vowels may ever come together. Frequently a sound occurs only in a special position or under special phonetic circumstances. In English, for instance, the z-sound of azure cannot occur initially, while the peculiar quality of the t of sting is dependent on its being preceded by the s. These dynamic factors, in their totality, are as important for the proper understanding of the phonetic genius of a language as the sound system itself, often far more so.

We have already seen, in an incidental way, that phonetic elements or such dynamic features as quantity and stress have varying psychological “values.” The English ts of hats is merely a t followed by a functionally independent s, the ts of the German word Zeit has an integral value equivalent, say, to the t of the English word tide. Again, the t of time is indeed noticeably distinct from that of sting, but the difference, to the consciousness of an English-speaking person, is quite irrelevant. It has no “value.” If we compare the t-sounds of Haida, the Indian language spoken in the Queen Charlotte Islands, we find that precisely the same difference of articulation has a real value. In such a word as sting “two,” the t is pronounced precisely as in English, but in sta “from” the t is clearly “aspirated,” like that of time. In other words, an objective difference that is irrelevant in English is of functional value in Haida; from its own psychological standpoint the t of sting is as different from that of sta as, from our standpoint, is the t of time from the d of divine. Further investigation would yield the interesting result that the Haida ear finds the difference between the English t of sting and the d of divine as irrelevant as the naïve English ear finds that of the t-sounds of sting and time. The objective comparison of sounds in two or more languages is, then, of no psychological or historical significance unless these sounds are first “weighted,” unless their phonetic “values” are determined. These values, in turn, flow from the general behavior and functioning of the sounds in actual speech.

These considerations as to phonetic value lead to an important conception. Back of the purely objective system of sounds that is peculiar to a language and which can be arrived at only by a painstaking phonetic analysis, there is a more restricted “inner” or “ideal” system which, while perhaps equally unconscious as a system to the naïve speaker, can far more readily than the other be brought to his consciousness as a finished pattern, a psychological mechanism. The inner sound-system, overlaid though it may be by the mechanical or the irrelevant, is a real and an immensely important principle in the life of a language. It may persist as a pattern, involving number, relation, and functioning of phonetic elements, long after its phonetic content is changed. Two historically related languages or dialects may not have a sound in common, but their ideal sound-systems may be identical patterns. I would not for a moment wish to imply that this pattern may not change. It may shrink or expand or change its functional complexion, but its rate of change is infinitely less rapid than that of the sounds as such. Every language, then, is characterized as much by its ideal system of sounds and by the underlying phonetic pattern (system, one might term it, of symbolic atoms) as by a definite grammatical structure. Both the phonetic and conceptual structures show the instinctive feeling of language for form.

Contents -BIBLIOGRAPHIC RECORD