How do climate and terrain affect the evolution of languages?

Linguist Ian Maddieson answers

Human speech is often compared to an orchestra. The orchestra has instruments that play higher pitched notes like the flute or piccolo and those that play lower notes, such as the cello or bassoon. It can play long drawn out notes with a very sonorous structure or rapid sequences of notes with a staccato texture. Different orchestral compositions make use of different amounts of these elements and they blend together into a complex sound. The human voice also produces a complex sound that includes multiple pitches and it can produce rapidly changing or relatively stable “notes.” Different languages vary in how much use they make of these various capacities of the voice. Asking why languages differ in this way is an intriguing question.

Most of the characteristics that make one language different from another can be traced to differences that already existed in older forms of these languages. For example, English allows a string of three consonants to occur together at the beginning of a word or syllable, as in “stream” or  “strong.” These words had very much the same shape in the Old English spoken in Anglo-Saxon England over 1,000 years ago. On the other hand, the Hawaiian language permits only a single consonant in the syllable-initial position, as in mahalo (roughly meaning “thank you” or “you’re welcome”). Since all the other Polynesian languages obey the same rule, we are fairly sure that the ancestral Proto-Polynesian language, estimated to be about the same age as Old English, also allowed only one consonant in this position. 

However, despite the often-seen historical conservation of older traits, languages also are constantly changing due to internal processes or the influence of outside factors. Old English used to allow other clusters of consonants at word-beginnings, such as /kn-/ and /wr-/. These clusters are still represented in the spelling of words like knight and knave, and write and wring, but these words are pronounced the same as night, nave, right and ring. Over time, the first consonant was pronounced more weakly and eventually was lost altogether. On the other hand, Modern English includes new consonant clusters that did not occur in Old English because it has borrowed words from other languages with these new combinations, such /sf-/ in sphere or sphincter. 

It is likely that the environment in which a language is spoken also features among the outside factors that shape the direction of language change. There is evidence, particularly from the study of the songs of numerous species of birds, that vocal communication in animals is partially adapted to work more efficiently in the local ecological and climatic conditions. An overall survey of much work on bird song found that birds living in more closed environments, such as forests, on average used a lower pitch range and less timing variation in their songs than birds living in open environments, such as a prairie or savanna. 

The explanation offered for this is that the closed environment is less efficient at transmitting higher pitched and more rapidly changing elements of a song, so their use is reduced in this environment. The closed environment necessarily produces more of an effect technically known as acoustic scattering. This breaks up the coherence of transmitted sound and particularly affects rapidly changing sounds and those that depend on the higher pitches for their identity. 

Broadly speaking, in human language the consonants are those elements that have more rapid changes and depend on higher pitched elements to be identified by a listener. Vowels are elements that are more sustained and can be recognized from their lower pitched components. So we might expect that languages spoken where there is more acoustic scattering would tend to make less use of consonants and rely more on vowels to get their message across. A global study of over 600 languages has found confirmation of this idea. Working from maps showing mean annual temperature, mean annual precipitation (rain- and snowfall), and maximum annual tree cover, and correlating the average values for these properties over the areas where the languages are spoken, shows that a significant part of the variation in sound patterns across these languages is predicted from the environmental conditions. Specifically, the more tree cover and rainfall an area has—these two factors are very closely correlated with each other—the less the language tends to rely on consonants. Both the number of different consonants the language uses and the amount they combine together in strings (as they do in English) is lower in languages traditionally spoken in wetter, more tree-covered areas. There is also a relationship between higher annual temperature and less use of consonants, as heated air can also be an acoustic scatterer. 

There is, of course, no hard and fast rule that a language spoken in a hot, wet forested area will simplify its use of consonants. The Australian English spoken in the tropical forests of Northern Queensland is the same as that spoken in the dry deserts of Western Australia. But there does seem to be evidence that languages consistently spoken in areas where acoustic scattering is high rely rather less on consonants compared with those spoken in cooler, dryer and more open environments. 

This pattern may have actually arisen in two ways. If human language was originally simple in its sound structure, then it might be that languages in the hot, wet areas have simply preserved these simpler sound patterns from an earlier time. In other areas, languages were able to more easily tolerate internal developments that led to larger numbers of consonant and more complex syllable patterns. Alternatively, languages in the hot, wet areas may have simplified originally more complex patterns. We may envisage the process as a listener-induced effect; that is, when it is harder to hear that a certain type of sound is present in a word, the listener may think that the word was actually pronounced without the sound in question. Eventually, the simpler form becomes the established norm. Since we do not know the earliest states of human language, we cannot know which scenario is more probable, nor can we rule out a scenario that combines elements of both these ideas.

Ian Maddieson is an adjunct research professor in the department of linguistics at the University of New Mexico, and adjunct emeritus professor in the department of linguistics at the University of California, Berkeley.