The Experimental Study of Intonation in Mandarin Chinese
Since the seminal work of Chao (1929, 1932, 1933), a rich body of research, both descriptive and experimental, has been produced to advance our understanding of the linguistic functions and physical properties of tone and intonation in Mandarin Chinese, especially regarding the interaction of tone and intonation. The book of Maocan Lin introduced here represents a respectful endeavor in this tradition, with its detailed and carefully designed study on the acoustics and perception of word-, sentence- and discourse-level prosodic phenomena. Lin takes Chao’s insights as a point of departure and explicitly adopts the autosegmental-metrical (AM) model (Ladd, 2008) in his experimental study of the prosodic system of Mandarin, of which tone and intonation are an integral part. Many earlier findings discussed in the book, including his own, still appear on the required reading list of any scholars who have a serious interest in working toward a general understanding of the phonetics and phonology of Mandarin Chinese prosody.
Linguists and speech scientists alike would benefit from the book’s historical perspective and its experimental approach. Practitioners in teaching Chinese as a foreign language and researchers in second language acquisition could seek instructional insights or use the book as an entry into the primary literature on Chinese tone and intonation for the acquisition-oriented study that has started to gain strong momentum in recent years (Yang, 2011; Zhang, 2013). A modest familiarity with general linguistics or Chinese linguistics is required to understand the line of argument in the book, and readers without any background in acoustic phonetics might find the more technical discussions challenging, too.
Lin starts with laying out his general framework, approach and methodology in chapter 1 and highlights Chao’s classical approach to the interaction of tone and intonation as ‘an algebraic sum’ (p. 1), realized as the superimposition of smaller waves (tones) on a larger wave (intonation). Chapter 2 summarizes results from production and perception studies of the four lexical tones that are characterized by distinctive patterns of fundamental frequency (f0), in conjunction with optional onglides or off-glides. The perception data show that the nuclear vowel in a single syllable probably carries the acoustic information for the reliable identification of tones. This issue has implications for the alignment of f0 targets of tones with the segmental string and is revisited in chapter 3. Disyllabic words from Mandarin Chinese and a Fujian dialect are used in two separate studies, both confirming the claim about the vowel as the key carrier of tonal information in the segmental string. Also in chapter 3, patterns emerging from tonal coarticulation in multisyllabic sequences are discussed, a topic that received extensive attention also from other researchers (Xu, 1997). The remainder of chapter 3 deals with word-level stress in disyllabic and trisyllabic words. The key in the experimental design is to minimize the impact of word internal structure or contrastive focus on the location of word-level stress. For example, f0 contours show positional effects insofar as they have an expanded f0 range at the end of the word, due to further lowering the final low f0 point. Syllable duration seems to not only correlate strongly with position in the patterns studied. Based on his experimental study and other analyses, Lin suggests that the word-level stress falls on the final syllable when all syllables in the word carry one of the four lexical tones (chapter 4). A different pattern emerges when the second syllable in a disyllabic word bears the so-called ‘neutral tone’ (chapter 4). There are both syntactic and morphological restrictions on the distribution of syllables with the natural tone. Moreover, these syllables are found to have a much shorter syllable duration, with vowels being often not fully realized. Lin concludes that syllables with neutral tone are more appropriately characterized as being instances of weak stress rather than representing a special tone sandhi. The stress pattern in this kind of disyllabic word is clearly S(trong)-W(eak), of which native speakers have almost unerring intuition.
In the remaining 3 chapters, the focus shifts to prosodic phenomena at the sentence and discourse level, which include focus-induced stress (chapter 5), intonation patterns associated with different sentence types (chapter 6), and a feature-based model of intonation (chapter 7). Lin argues that significant modifications of surface f0 contours in a prosodic or intonational phrase can be identified in metrically or prosodically prominent positions such as syllables carrying stress associated with narrow focus and, at the end of the phrase, in the form of a boundary tone. This idea is in line with the AM model of intonation. For example, narrow focus and broad focus modify the surface f0 contours of the associated syllable(s) differently. Different sentence-type (i.e. ‘functional’) intonations often correlate with different boundary tones at the end of prosodic domains. Based on previous studies and his own production data, Lin proposes a model of intonation in which the surface f0 contours of a prosodic phrase can be modeled by the combined effects of focus stress and boundary tones. Experimental results indicate that narrowly focused syllables show a significantly higher f0 peak in combination with the first, second and fourth tones, and a lower f0 trough in combination with the third tone. Two intonational features are proposed to capture this difference: [RAISEH] and [LOWERL]. Broad focus does not raise f0 peaks, but the pitch range at the end of the prosodic phrase is expanded. Declarative and interrogative intonations are distinguished by different boundary tones that either raise or lower the whole tone of the final syllable. Their features are [RAISETONE] and [LOWERTONE]. Lin suggests that the feature-based model captures the significant pitch events related to focal prominence and boundary tones in Mandarin Chinese intonation.
It is worth noting that Lin’s adoption of the AM model in his analysis of the intonation of a tone language like Mandarin could point to some kind of universality of intonational structure. One of the reviewers has also successfully applied the model to account for the intonational patterns in wh-questions and yes-no questions with and without focus in Chaha, a Semitic language spoken in Ethiopia (Li, 2002).
The book includes 4 appendices. Though the 2 word lists are useful and relevant to readers who are interested in the experimental setup, one may wonder whether it is necessary for the intended readership of a specialized book like this one to include brief discussions on the basics of speech production and analysis, and on the inherent intensity of vowels.
In terms of the structure of the book, as we noted above, sections 3.2 and 3.3 in chapter 3 on tonal coarticulation and alignment of f0 targets are thematically more aligned with chapter 2 whereas word-level stress in section 3.1 is the main focus in chapter 4. The rationale behind this arrangement of topics is not clear to us.
In sum, Lin’s book is not only a significant addition to the study of Mandarin Chinese intonation (Yuan, 2004; Liu and Xu, 2005), but it also has implications for pedagogy, acquisition study, and speech technology. In the pedagogical realm, the focus has been traditionally on teaching the four tones, i.e. those elements of Chinese phonetics and phonology with which learners tend to struggle the most. The attention to intonation is at best scanty, due to the fact that textbooks and practitioners lack a formal framework that can effectively guide their teaching efforts and methods. Lin’s book has provided a foundation for such a much-needed framework. In addition, we are seeing increased adoption of speech analytics in various call centers (Masterson, 2013). A better understanding of the characteristics of tone and intonation as outlined in Lin’s book facilitates our effort to model the variations caused by different emotions of the speaker. His book also facilitates developing expressive text-to-speech systems for Mandarin Chinese.
Chao YR (1929): The study of Beiping intonation; in Milne AA (ed): The Camberley Triangle (Appendix). Shanghai, Zhonghua Bookstore.
Chao YR (1932): A preliminary study of English intonation (with American variants) and its Chinese
equivalents. Ts’ai Yüan P’ei Anniversary Volume, Bulletin of Institute of History and Philology, suppl No 1, pp 105–156.
Chao YR (1933): Tone and intonation in Chinese. Bull Inst History Philol 4:121–134.
Ladd DR (2008): Intonational Phonology, ed 2. Cambridge, Cambridge University Press.
Li Z (2002): Focus, phrasing and tonal alignment in Chaha; in Csirmaz A, Li Z, Nevins A, Vaysman
D, Wagner M (eds): Phonological Answers (and Their Corresponding Questions). MIT Working Papers in Linguistics, No 42, pp 195–215.
Liu F, Xu Y (2005): Parallel encoding of focus and interrogative meaning in Mandarin Chinese.
Masterson M (2013): NICE rolls out fraud prevention solution. Speech Technology Magazine, January 9, 2013.
Xu Y (1997): Contextual tonal variations in Mandarin. J Phonet 25:61–83.
Yuan J (2004): Intonation in Mandarin Chinese: Acoustic, Perception, and Computational Modeling; PhD dissertation, Cornell University.
Yang C (2011): The Acquisition of Mandarin Prosody by American Learners of Chinese as a Foreign Language (CFL); PhD dissertation, Ohio State University.
Zhang H (2013): The Second Language Acquisition of Mandarin Chinese Tones by English, Japanese and Korean Speakers; PhD dissertation, University of North Carolina at Chapel Hill.
（本文原刊于Phonetica 2016; 73: 141-143，已获得转载授权。）
林其光（Qiguang Lin），瑞典皇家工学院博士，曾师从语言所吴宗济先生和林茂灿先生，Gunnar Fant（方特）教授，并长期与James Flanagan教授同事。先后在IBM、Yahoo等世界500强公司以及加州硅谷新创公司工作，从事语音技术、大数据技术的研发和应用。2011年共同创办了无锡百互科技有限公司。