Home
About
Journals
搜索

Contents and Abstracts ( Vol.18, 2023 )

Source:Chinese Journal of Phonetics Time:2023-11-14
font size: small medium large

1 A REVIEW OF EXPERIMENTAL PHONOLOGY

CHEN Zhongmin

Abstract: 

This paper introduces the history of experimental phonology and its contribution to the field of phonology. In this paper, I point out that experimental phonology has its own distinctive features in five aspects. First, experimental phonologists view language as a phenomenon of nature, albeit a particularly complex one. Language as a cognitive system imputed to individuals is thus to be explained in terms of general facts about the physical world; in terms of specific capabilities of the human species which arose through evolution; and in terms of the interactions of the organism with its environment. Therefore, it is possible to study and explain phonological systems and universal evolution laws of phonology by experimental methods. Second, On the basis of this viewpoint, experimental phonologists believe the research theories and methods of experimental phonology are as clear as those of mature scientific disciplines, such as mathematics, physics and chemistry. Hypotheses are proposed, data are collected and analyzed, hypotheses are verified or modified, and then the theory is established, and then the prediction is made again and again until the theory is more perfect. Third, the scope of experimental phonology is broader than that of traditional phonetics or formal phonology. The scope of experimental phonology includes not only the content of traditional phonetics, but also the neurocognitive mechanism of speech production, transmission and perception, as well as the relationship between expression and posture information related to speech and social phonetics. Fourth, experimental phonology does not only study and describe the synchronic phonological system of a language, but also studies diachronic changes. That is to study and explain the formation of synchronic speech system.Therefore, the research works of experimental phonology often include synchronic and diachronic content, and explain the formation of synchronic phonology by the natural rules of sound change.Fifth, experimental phonology tends to study the sound systems and types of the world's languages, as well as the general laws of sound evolution. These general and universal laws are repeated throughout the history of many languages without genetic relationships. In this paper, four types of studies of experimental phonology are presented to illustrate the differences between experimental phonology and formal phonology. They are as follows: 1.The tendency of obstruent consonants to be voiceless; 2.The formation mechanism of the sound system; 3.Explanation of emergent stops; 4.The gradient perception of sound category. Finally, some achievements of experimental phonology in the study of Chinese are discussed. The whole article is divided into five parts: 1.The definition and history of experimental phonology; 2.The history of experimental phonology; 3.Examples of experimental phonology studies; 4.Some achievements of experimental Phonology in China; 5.Conclusion.

Keywords:

Experimental phonology; Quantal theory; Categorical perception; Gradient perception; Exemplar model;  

PDF Download

 

2 ENHANCEMENT IN MANDARIN CHINESE

LI Zhiqiang

Abstract: 

According to the Speech Enhancement theory, there is a universal set of distinctive features that define the phonological contrasts observed in the languages of the world.Each of these features is defined by an articulatory action in producing the segment and by a particular acoustic pattern and auditory impression.In speech production, the perceptual saliency of a contrast specified by a feature is often enhanced by introducing articulatory gestures in addition to the defining gesture for the feature.These gestures strengthen the defining acoustic properties and/or introduce additional properties to increase the perceptual contrast of the feature.In this squib, we first discuss enhancing gestures in the production of Mandarin fricatives, and revisit the phonological status of apical vowels, following the enhancement-based analysis of the fricatives.The three strident fricatives in Mandarin and their affricated cognates are distinguished by two distinctive features, [anterior] and [distributed].In featural terms, the alveolar/s/is [+anterior] while the retroflex/ʂ/is [-anterior].A similar contrast between/s/and/ʃ/in English is enhanced by the lip rounding gesture in the production of the [-anterior] fricative/ʃ/, but a different strategy is adopted in Mandarin because the lip rounding gesture is contrastive in defining unrounded and rounded pairs of fricatives.The vocal-tract configuration for/ʂ/shows a somewhat retroflexed shape for the tongue blade, leading to a relatively longer front cavity, whose resonance is associated with F3.For the [+anterior] fricative/s/the frication noise is centered at high frequencies in the F5 region.For the [+distributed] palatal fricative/ɕ/, a more fronted tongue body position is more compatible with the long constriction formed by the tongue blade against the palate.In light of the enhancement-based analysis, we propose that, despite the surface variations of the three vowels/ɿ/,/ʅ/and/i/following/s/,/ʂ/and/ɕ/respectively, the underlying representation of/ɿ/and/ʅ/is/i/.In speech production, the processes of/i/→/ɿ/and/i/→/ʅ/is the result of the avoidance of palatalizing/s/and/ʂ/in the context of/i/so that the phonological contrast of/s/,/ʂ/and/ɕ/is preserved.In addition to acoustic data and the synchronic phonological analysis, evidence from the diachronic studies is also cited in support of the current proposal.We also discuss another case of speech enhancement in Mandarin, in which the place feature of the nasal coda in a syllable is enhanced by the tongue body movement of the low vowel in the nucleus.It has been observed that the two nasal endings/n/and/ŋ/are often weakened or even deleted in the acoustic signal when they precede a vowel-initial syllable, as in/ʂ an ɑʊ/.However, in this example the actual pronunciation is/ʂ 2 ɑʊ/, in which the nasal ending is deleted, with the nuclear vowel being nasalized.More importantly, the place feature of the nasal [-back] is “transferred” to the nuclear vowel, as evidenced by the F2 movement, which is an indication of the tongue body movement.The non-contrastive enhancing gesture, in this case, has become the defining gesture for the distinctive feature when the nasal closure is missing from the acoustic signal.In summary, we have reviewed two examples of enhancement in Mandarin, in which new perspectives are offered.By comparing examples in English, we observe that strategies used in enhancing the same contrast defined by a distinctive feature in different languages are constrained by language-specific phonological patterns.

Index Terms:

Speech enhancement; Fricatives; Apical vowels; Low vowels; Nasal endings;

PDF Download   

 

3 ACOUSTIC PARAMETERS AND FACTORS AFFECTING 

VOICE ATTRACTIVENESS

LIU Yan, WANG Youlin

Abstract:

In recent years, voice attractiveness, as an interdisciplinary field in linguistics and psychology, has become a hot topic for researchers all over the world, and it has also been set as an independent research topic in the Conference of INTERSPEECH. However, the related achievements of Chinese researchers are not many and are basically limited to the field of psychology. This paper aims to review recent studies of voice attrativeness at home and abroad from the perspective of its six major acoustic parameters and corresponding physiological, psychological, and social factors that affect them, the convenience of domestic scholars in this field.

Results from research on voice attractiveness assessment are summarized as follows. (I) Fundamental frequency (F0) . High scores are given for male voice with low F0 and female voice with high F0. (II) Vowel formant and its dispersion. High scores are given for male voice with low formant frequency and dispersion and for female voice with high formant dispersion. (III) Duration. Raising voice speed appropriately will improve voice attractive- ness. (IV) Amplitude. High scores are given for speech with significant amplitude variation in lecture. (V) Pho- nation types. High scores are given for voice with breathy feature. (VI) Voice distance-to-mean. The distance ap- proaching the average value denotes high voice attractiveness.

Factors affecting the assessment of voice attractiveness encompass physiological, psychological, and social ones. For instance, physiological factors such as facial attractiveness and sex hormone are positively correlated with voice attractiveness; the identical voice is assessed with different scores owing to the listener' s gender; lan- guages with various cultural backgrounds differ in aesthetic foundation; scores of people with higher social status will be higher. As a series of factors are concerned in research on voice attractiveness, acoustic parameters are suggested to be considered together with physiological, psychological and social factors. At present, there still ex- ist some inadequacies in research on voice attractiveness, to name only a few, unbalanced application of acoustic parameters, and lack of research methods, and few researches output from cross-language and cultural perspec- tives. To sum up, research on voice attractiveness involves complex issues of many fields. More researchers are required to devote to it from various angles with interdisciplinary and multiple methods.

Keywords:

Voice attractiveness, Acoustic parameters, Relevant factors

PDF Download

 

4 A REVIEW OF CHINESE RHYTHM RESEARCH 

YIN Zhigang

Abstract:

This article systematically summarized the studies of Chinese rhythm and discussed the rhythm type of Chinese. The article first defined the concept of rhythm, which refers to the pattern of regular appearance of lan- guage salient elements in the time sequence. The concept contains two parts: the first part is the contrastive fea- ture of the salient elements; the second part is the regularity of the salient elements in the time sequence, or the organizational pattern of the rhythmic units. The main body of the article presented the studies of these two parts, as well as rhythmic computational models and language rhythm types.

In the introduction of studies related to the first part of the rhythm concept ( contrastive features) , studies based on stress feature, studies based on pausing & delaying features, and studies based on tone feature were in- troduced ( see section 2) . (1) Studies based on stress features are the mainstream of language rhythm research. Stress is the most important contrastive feature that forms the rhythm of English. However, it is still controversial whether Chinese has word stress except neutral tone ( and lightly pronounced) syllables. This part focused on dif- ferent views of Chinese word stress in phonological and phonetic studies, and also briefly introduced the concept of utterance stress. In addition, the acoustic features that influence the perception of stress were discussed, with the most important features being F0 and duration, followed by intensity. (2) Studies based on pausing and dela- ying features suggest that the core feature of Chinese rhythm is not stress but pausing and delaying feature. (3) Studies based on tone feature considered tone to be the fundamental feature of Chinese, and possibly the core fea- ture of Chinese rhythm. Some studies connected “tonal/ non-tonal” feature with “light/weight” features, which provided new idea for the study of Chinese rhythm.

In the introduction of studies related to the second part (the regularity of the salient elements in the time sequence) , the linear models and hierarchical models of rhythmic units were introduced, and some rhythm researches such as speech rate were also presented (see section 3) . (1) In the section of linear models, foot, beat, and linear block model were introduced. (2) In the section of hierarchical models, some rhythmic hierarchical models of Chinese rhythm were introduced. These studies were mainly based on Selkirk's prosodic hierarchy theory, and some typical hierarchies were: syllable-foot / prosodic word-prosodic phrase-intonational phrase-utterance. (3) This part also introduced some studies of issues that were closely related to language rhythm, such as speech rate, pau-

Keywords:

Rhythm, Contrasting feature, Organizational model, Tone, Stress 

PDF Download

 

5 NEUTRAL SPEECH PROSODY IMPOSES PERCEPTIVE ALTERATIONS ON SEMANTIC EMOTION: EVIDENCE FROM A LARGE-SCALE AFFECTIVE RATING EXPERIMENT

TANG Enze, GONG Jie, GUAN Jingjing, DING Hongwei.   

Abstract: 

Emotion words, as indispensable experiment materials in the psycholinguistics field, have been selected as the stimuli of auditory and cross-modal experiments based on the affective norms collected in the visual modality (written form carried by characters or letters). However, the influence of the default neutral speech prosody carrier on lexical emotion perception has been neglected.Therefore, this study aims to reveal the asymmetrical cross-modal lexical emotion perception by a large-scale affective rating experiment, and to connect the behavioral perception with acoustic parameters provided by two involved speakers.Specifically, this study collected affective ratings for 195 two-character Chinese emotion adjectives from 361 university students in both visual and auditory modalities, which measured familiarity, valence, arousal, dominance and intensity variables.Results indicated that the valence variable showed a neutrality convergence in the auditory modality, but emotion intensity remained unchanged.Acoustic analyses manifested that the speech-engendered perception differences in dominance and arousal were subject to speaker-oriented individual differences in duration, F0 variation and voice quality.This research is among the first to explicate the cross-modal asymmetrical nature and challenge the acknowledged juxtaposition of affective comprehension in the visual modality and the default auditory condition carried by neutral intonation.Researchers are suggested to manage the acoustic features even for neutral auditory stimuli to assure their functions as fillers or controls, and future emotional speech databases also need to report the affective norms in both modalities.

Keywords:

Neutral speech prosody; Semantic emotion; Affective rating; Cross-modal;

PDF Download

 

6 A STUDY ON CROSS-LANGUAGE BACKGROUND BILINGUALS PERCEPTION AND COMPREHENSION OF PROSODIC FOCUS-MARKING IN MANDARIN 

LIU Zenghui, JING Jia

Abstract:

This study uses prosodic focus-marking decoding task to test cross-language background bilinguals' per- ception and comprehension of prosodic focus-marking in Mandarin. The present study aims at examining the difference between bilinguals and monolinguals in decoding prosodic focus-marking and discussing language uni- versal and language specific Characteristics in the process of decoding prosodic focus-marking. Thirty native Man- darin speakers and 70 cross-language background bilinguals participated in the experiment. Specifically, 28 Bai- Mandarin, 12 Yi-Mandarin, 9 Tibetan-Mandarin, 12 Uygur-Mandarin, and 9 Naxi-Mandarin bilinguals were in- cluded in the present study. All the bilingual participants were college students or teachers, who started their Mandarin education at ages 4 to 6. All the bilinguals were screened by a questionnaire concerning language use and language attitude. An experiment based on the E-prime platform performing a comprehension task was con- ducted. In the experiment, the position of focus and the structure of focus type were controlled to examine the effect of focus and focus types. Auditory stimuli were played to all the participants. The participant's task was to choose the most suitable response as answer to Question-Answer dialogues. The reaction time and correct respon- ses were automatically collected by E-prime platform. Mixed-effect modeling was used for analyzing the results of reaction time, and binominal logistic regression was used for analyzing the results of response accuracy.

The results showed that Mandarin-speaking monolinguals could distinguish different focus conditions and they spent the longest time on broad focus conditions the response accuracy was the lowest. However, Mandarin mono- linguals showed the highest response accuracy identifying sentence-initial and medial focus. Grouping all the bi- linguals as a group, the results showed that bilingual speakers could distinguish different focus conditions in Man- darin. Bilingual speakers also spent the longest time identifying broad focus with the lowest response accuracy. In addition, bilinguals showed the highest response accuracy in identifying sentence-initial and medial focus. Differ- ent from Mandarin-speaking monolinguals, bilinguals showed no difference between sentence-initial and sentence- medial focus in reaction time and response accuracy. Taking the reaction time and response accuracy results to- gether, the present study showed that Bai-Mandarin and Tibetan-Mandarin bilinguals were similar to Mandarin-speaking monolinguals while Yi-Mandarin, Uygur-Mandarin, and Naxi-Mandarin bilinguals were quite different from Mandarin-speaking monolinguals in terms of comprehending prosodic focus-marking in Mandarin.

The results indicate that both language-universal and language-specific properties play important roles in de- coding prosodic focus realization. The present study reveals the effect of language background on the perception and comprehension of prosodic focus-marking. Specifically, bilinguals with different language backgrounds differ in the decoding of prosodic focus-marking. However, the difference in comprehending prosodic focus between Yi- Mandarin and Bai-Mandarin bilinguals cannot simply be attributed to the impact of language background, as Yi- Mandarin and Bai-Mandarin's native language are quite similar in terms of mainly relying on durational variation for encoding focus. In addition, the present study reveals the language universal. Specifically, all the bilingual speakers show the highest response accuracy while using the shortest time in the sentence-initial and sentence-me- dial focus, which might be the presence of post-focus compression in these positions.

Keywords:

Cross-language, Prosodic focus-marking, Perception and comprehension

PDF Download

 

7 ACOUSTIC REALIZATION OF TONES IN DISYLLABIC WORDS IN GUNGBESOVI-GUIDI

Wachinou Lionnel Pyrrhus CAO Wen

Abstract:

Gungbe is a tonal language spoken in the southern part of the Republic of Benin in West Africa, belonging to one of the Gbe languages in the Kwa branch. The word “Gbe” means “language”, and the word “Gungbe” refers to “the Gun language”. Gbe languages comprise all languages or dialects with the lexeme “gbe” ( e. g. , Gungbe, Fongbe, Gengbe, Wem?gbe). Gbe-speaking communities live in West Africa, including the southern part of the Volta region in Ghana, the southern part of Togo, the southern part of Benin, as well as different localities of the Ogun State and the Lagos State in Nigeria. There are two underlying tones in Gungbe: the high tone ( H) and the low tone ( L) . The other tones, i. e. , mid-tone ( M) and rising tone ( LH) , are realizations of these two tones. In Gungbe, depressor consonants are voiced consonants, and the influence of the depressor consonants on the high tone is evident.

This study focuses on the tone realization of disyllabic words in Gungbe based on a phonetic experiment. The variety under study here was spoken in Porto-Novo in Benin. The Gungbe disyllabic words data analyzed in this study were based on fieldwork with four native speakers in Porto-Novo ( two men and two women) , the speakers were aged between 25 and 36, who were grown up in the Gungbe environment. This study investigates two disyl- labic structures, i. e. , VCV and CVCV. The initial V in VCV can only be a non-high tone, and there are 8 com- binations with 3 words for each combination. CVCV has 16 tone combinations with 6 words for each combination. This phonetic experiment used Praat software to record and annotate the stimuli, and used a script to extract the fundamental frequency values of 10 points in equal distance for each syllable. The fundamental frequency values were then converted into semitone values.

The experimental results show that in the VCV structure, the tone of the first syllable can be predicted, which is a mid tone ( M) , the second syllable tone is not affected by the tone of the first syllable, so it does not change. The tone realization of the CVCV structure is much more complicated. In the CVCV structure, no matter when the voiceless high tone ( H) is on the first syllable or the second, its realization does not change. High tone ( H) realization is still high level. Whether the consonant of the low tone ( L) is voiced or voiceless, the low tone ( L) becomes a mid tone ( M) on the first syllable of the CVCV structure. Then, the mid tone is a realization of the low tone ( L) . In Gungbe, the rising tone ( LH) is the realization of the high tone ( H) , which occurs just when the initial of the syllable is voiced. It also occurs when the consonant initial of the first syllable of the disyl- labic word is voiced. When the rising tone ( LH) is in the second syllable of the CVCV structure, it is no longer affected by the initial consonant and becomes a high tone ( H) . In other words, the tonal realization of the rising tone ( LH) changes from the original rise to a high level at the last syllable of the CVCV structure.

Keywords:

Gungbe, Disyllabic words, Tones, Acoustic realization

PDF Download

 

8 PROSODIC FOCUS OF STRUCTURE “LIAN NP DOU VP” AT THE MIDDLE OF ASENTENCE

WEN Baoying, DONG Weiyan

Abstract:

The Lian-sentence is a key and difficult point in the study of Chinese special sentence patterns. The com- mon form is “Lian. . . Dou / Ye. . . ”, which involves three very important words “Lian”, “Dou” and “Ye”. As a function word in Chinese, “Lian” has a lot of controversies about its part of speech and grammaticalization process, which have not been finalized yet; “Dou” and “Ye”, as very important adverbs in Chinese, also bring many com- plex research problems to sentence patterns. The grammatical significance of Lian-sentence lies in emphasizing po- larity and implicit comparison, giving implicit meanings, which has been basically recognized by the academic com- munity. But there are great differences in previous research. Most of the academic research on Lian-sentence is fo- cused on grammar, semantics and pragmatics, lacking analysis and description from the perspective of phonetics.

Based on the concept of prosodic pattern, the present paper explored the prosodic focus of Lian-construction at the middle of asentence, of which the structural form is “Lian NP Dou VP”. We designed two sets of experi- mental corpus for comparison studies, and investigated 10 native Chinese speakers on pitch, duration and intensi- ty, to explore the prosodic focus pattern and to take a closer look on the relation among three elements of prosodic focus. The results showed that Lian-construction ( Lian NP) is the focus of this sentence pattern, with significant expansion in pitch and duration, and enhanced intensity, showing evident contrast. Due to larger loadings of infor- mation on Lian-construction at the middle of asentence, it turns into a contrastive focus. Although this construction does not locate the topic position at the beginning of sentence, the focus effect also occurs with expanding in pitch range, extending in duration and enhancing in intensity, to form a stronger contrast. Moreover, post-focus position sees compressing in pitch range, decreasing in duration and reducing in intensity, natural focus changes ihto the highlight of focus, representing a stronger contrast. According to quantitative analysis of prosodic features , “Lian” participates in focus performance, as a focus-sensitive operator, and has a little impact on pre-focus component. “Dou” participates in a more limited focus performance in the middle of a sentence, inferior to the evident focus performance of “Lian”, with no significant effect on pitch range, duration and intensity. The NP construction is the focus of sentence pattern, as correlative component of “Dou”, making a focus performance in pitch, duration and intensity, and post-focus position performance as compressing in pitch range, decreasing in duration and re- ducing in intensity. The three elements of prosodic focus perform differently on focus position and post-focus posi- tion. The three elements of prosodic focus are relatively synchronized on focus performance, while on post-focus position, pitch and intensity perform consistently while duration is more independent, indicating that the three elements of prosodic focus are not equivalent and unbalanced in sentence with focus construction.

Keywords:
Lian-construction, Focus, Contrast

PDF Download  

 

9 A CONTRASTIVE STUDY ON PHONETIC PROMINENCE BETWEEN MANDARIN CHINESE AND ENGLISH TRISYLLABIC WORDS  

GUO Jia, CUI Sihan

Abstract:

This study makes a contrastive analysis to investigate word phonetic prominence in different types of morphological structures between Chinese and English trisyllabic words to explore the commonalities and differences in the relative prominence between adjacent syllables within words, by taking pitch range ratio, duration ratio and energy ratio into consideration. In the aspect of the word phonetic prominence pattern, the adjacent syllables do demonstrate phonetic prominence in both languages, yet the prominence degree varies: phonetic prominence patterns of English words are obvious while Chinese words present slight phonetic prominence tendency; Secondly, all English trisyllabic word patterns are consistent with English stress patterns, while Chinese words' slight right-edge prominent pattern may be a boundary tendency. Besides, in the aspect of the ranking of the acoustic cues in signaling the stress, English words mainly present the ranking of intensity(amplitude-integral)>pitch range≥duration, while the ranking in Chinese words is duration>intensity(amplitude-integral)≥pitch range.

Keywords:

Word stress; Phonetic prominence; Trisyllabic words; Morphological structure;  

PDF Download 

 

10 A PHONETIC STUDY OF DISYLLABIC TONE SANDHI IN WUZHI DIALECT  

ZHU Yuzhu, LI Aijun

Abstract:

This paper presented a detailed analysis of the disyllabic tone sandhi and coarticulation in Wuzhi dia- lect, based on the experimental production data of 9207 items collected from 11 speakers. In the current study, we focused on the tonal sandhi of disyllabic words and the role of coarticulation on this process. The experimental procedure involved using xRecorder to record all the items, automatic segmentation of sound files by xSegmenter and manual proofreading of TextGrid files. Praat Scripts were then used to extract detailed fundamental frequency of the pitch, then 10 disyllabic tone sandhi combinations of pitch contour are drawn and analyzed. In order to a- void the problem of confused boundaries during defining the reconstruction of tone sandhi, the distinction stand- ards were based on the actual pitch value as the main judgement indicator, including the tonal shape and tonal register, supplemented by the listening criteria for reference to the mother tongue. In this study, we proposed two criteria for identifying tone sandhi. One is when there is a significant change in the pitch contour, the involved tonal combination can be directly identified as tone sandhi. Another criterion is the significant change in tonal reg- ister from H to L or L to H. We also analyzed the phonological pattern of disyllabic tone sandhi in Wuzhi dialect and provided a phonological explanation formula. Additionally, we compared the duration of disyllabic tone sandhi and a coarticulatory analysis was to examine the relationship between duration and pitch contour. It has been found that the regressive assimilation is more significant than progressive assimilation, that is, the second syllable affects the first one more than the reverse, thus an asymmetric relation has been identified. The duration of T1 - T5 in disyllables is shorter than that in monosyllables. Furthermore, in disyllabic tones, the duration of T1 - T5 is shorter in the first syllable than that in the second syllable. The impact that the second syllable on the first syl- lable in T1 - T5 + X combinations is greater than that of the first syllable on the second syllable in X + T1 - T5 combinations, indicating an asymmetric relationship. The study also indicates that the pitch value at the boundary of the disyllabic tone sandhi greatly influences the pitch contour. F0 point is affected much more by the boundary when closer to it. Specifically, in the T1 + X combination, the tone value changes from 44 to 42 when T1 was combined with T4 due to the influence of the low starting point of the second syllable T4. Similarly, in the T2 + X combination, the tone value changes from 31 to 33 when T2 is combined with T3 due to the effect of the high starting point of the second syllable T3. The minimal pitch of the disyllabic word generally only appears at the endof the second syllable, and the position of the pitch peak depends on the highest pitch of the citation tone in the disyllabic tonal combination. Whether it is the T1 - T5 + X combination or the X + T1 - T5 combination, the highest F0 point is mainly located at T3, with six groups showing the highest F0 point appearing in the first sylla- ble and four groups in the second syllable.

Keywords:

Wuzhi dialect, Disyllabic tone, tone sandhi, Tonal coarticulation, Acoustic analysis

PDF Download 

 

11 ANALYSIS OF THE CHARACTERISTICS OF INITIALS AND FINALS IN MANDARIN FROM THE PERSPECTIVE OF ACOUSTIC DISTANCE

HUANG Wei, RAN Qibin  

Abstract:

In previous studies on the similarity of the initials and finals in Mandarin, researchers have typically calculated the perceptual distances, while the acoustic distances among those phonemes were less studied. In ad- dition, the investigation on the initials is more often seen than that on the finals in terms of both the perceptual distance and the acoustic distance. While most of these studies have analyzed the initials and finals separately, only a limited number of them have compared the initials and finals on the same level, which is regarded to be useful to understand the characteristics of the initials and finals in Mandarin.

In this paper, the initials and finals in Mandarin were placed in the context of“X-Y-X”, where X could be one of the six monophthongs (i. e. , /a o ɣ i u y/), and Y could be any of the the 21 initials and 39 finals in Mandarin. We collected recording data from four speakers ( two males and two females) who were asked to read the segmental combinations in a sound-proofed laboratory. All speakers are trained students in phonetics. The “Dynamic Time Warping”algorithm was used to calculate the acoustic distance ( based on Mel-frequency cepstral coefficients) of the initials and finals in Mandarin. According to the distance matrix based on acoustic distance data, the Average Linkage ( between-group) method and Neighbor Joining method were used to cluster, and the two-dimensional Principal Component Analysis was carried out. The clustering results of the Average Linkage method showed that sounds with the same place of articulation or similar structure often gathered at one terminal node, but the interpersonal variability of nasal ending was relatively large.

The results of PCA showed that 1st component was mainly related to the loudness of sounds or the intrinsic ampli- tudes of sounds; 2nd component mainly reflected the additional effect of frequency and duration of energy in the spec- tral energy concentration region. The results of Neighbor Joining clustering showed that the boundary between the ini- tials and the finals is clear, and the initials are further divided into two categories according to whether or not it is a so- norant ( i. e. , m, n, l, r in this study) . The place and manner of articulation alternately played a leading role in the classification of non-sonorants. Stops were always completely separated from affricates and the fricatives. For affricates and fricatives, the role of aspirating was relatively unimportant. In the category of finals, the Qichihu finals and the Cuokouhu finals were clustered into one group, while the Kaikouhu finals and the Hekouhe finals were clustered into another group. The Kaikouhu finals and the Hekouhu finals were often divided into different sub-categories according to the main vowels. Last but not least, we showed that the Neighbor Joining method was better at clustering of the Chinese initials and finals than the Average Linkage (between-group) method.

PDF Download   

 

12 THE TYPOLOGY OF VOICELESS FRICATIVES ACROSS CHINESE DIALECTS 

ZHANG Jialin, LI Mingxing

Abstract: 

This study examines the typology of voiceless fricatives across 201 Chinese dialects, focusing on their inventories, place contrasts in two vowel contexts [_i] and [_a], as well as the phonotactics of voiceless fricatives in terms of their combinations with the two vowels.The results show an average of 4.11 fricatives per dialect, with [s f ɕ x] as the most frequently observed.As for fricative inventories, [f s ɕ x] is the most frequent and [f s ɕ ʂ x] the next.For voiceless fricatives, the [_a] context is generally more likely to license place contrasts than the [_i] context.In terms of phonotactics, fricatives at different places show different patterns in their combinations with vowels, with [ɕ ʃ] most frequently followed by [i] while [f s ʂ x h] by [a].

Keywords: 

Voiceless fricative; Place contrast; Phonotactics; Chinese dialects;

PDF Download 

 

13 AN ACOUSTIC ANALYSIS OF MID VOWELS IN TANG Anqi, JIMÉ NEZ Jesú s
SIPING DIALECT OF CHINESE
 

Abstract:

In this paper, we analyzed the mid vowels that are found in the Chinese spoken in Siping ( Jilin prov- ince, northeastern China) . The study has two main objectives: first, to characterize acoustically the realization of these vowels, second, to discuss which phonological interpretation fits better the attested variants. To study the mid vowels, we have recorded a group of six young female speakers from the Siping variety, with a similar cultural background. The vowels appear in eight different contexts: in open syllables, after a palatal consonant: yē “coco- nut”, after a velar consonant: gē“brother”, after a retroflex alveolar consonant: shē “luxurious”, after a labial consonant: pō “hillside”, in closed syllables, before a front glide: gēi “to give”, before a back round glide: gōu “ditch”, before an alveolar nasal: gēn “to follow”, and before a velar nasal, gēng “to change”, All the words have been registered inside the carrier sentence wǒ shuō_dā yí cì “I say_dā once”, to obtain a sample as homoge- neous as possible both segmentally and tonally. The subjects were asked to read aloud each sentence seven times. All in all, we have analyzed a sample of 336 vowels: 6 speakers × 7 repetitions × 8 contexts. The vowels have been manually segmented and labeled with Praat, taking the spectrogram and the intensity as acoustic cues. A Praat script has been used to extract the duration of the whole segment and measure at the center of each vowel, the intensity and the first two formants, which have been normalized following Watt & Fabricius ( 2002) procedure. With these data, one-way analysis of variance ( ANOVA) tests have been carried out, taking the ex- tracted parameters as the dependent variables and the vocalic contexts as the independent variable.

The results showed that there are no differences in the intensity of the vowels. As for the duration, vowels in closed syllables tend to be longer than vowels in open syllables, as expected. Finally, the data drawn from the normalized formants indicate that there are up to five different segments: [ o] ( context gōu) , [ ɣ] ( contexts gēn, gē, shē, and pō) , [ ə] ( context gēng) , [ ] ( context yē) , and [ e] ( context gēi) . According to the first normal- ized formant, the variants display two degrees of openness, with [ o] and [ e] as slightly more closed than [ ɣ] , [ə], and[ ].Sincethetwomostclosedvowels, [o] and[e], appearbeforetheglides[w] and[j], theirrela- tive closeness can be attributed to the assimilatory influence of these segments. On the other hand, although the statistic test distinguishes between [ ] and [ e] , these sounds can be considered variants of the same vowel, whose openness depends on its length, being the longest segment, [ ] , the most open vowel.

As for the place of articulation of the segments, based on the second normalized formant, there are four dis- tinctions, going from the back round vowel [ o] to the front vowels [ ] and [ e] , with [ ɣ] closer to [ o] and [ ə]closer to [ ] and [ e] . The back unrounded vowel [ ɣ] displays small differences in place of articulation in the contexts gēn, gē, shē, and pō, but the test interprets these variants as occurrences of the same vowel, realized as slightly fronted ( in the context gēn) or slightly backed ( in the context pō) with respect to a central reference in the contexts gē and shē. The realization of the vowel in the context pō, approximately as the vowel in gē and differ- ently from the back round segment in gōu ( namely, as a back unrounded vowel [ ɣ] ) , is a typical feature of the Chinese spoken in northeastern China ( see, among others, Cai Yue, 2021) . As for the variants [ ɣ] ( context gēn, gē, shē, and pō) and [ ə] ( context gēng) , defined by the statistic as different, their distance is similar to the separation found in the allophones of the vowel /a/ in the contexts gān “to dry” and gāng “just”; hence, they could be considered variants of the same vowel as well.

  The five variants in the Siping variety occur in complementary distribution and, therefore, could be derived from a single mid vowel, as suggested by some researchers ( see, for instance, Cheng, 1973, and Duanmu, 2007) . In our case, / ɣ / , which is the variant appearing in most open syllables, would be the best candidate to derive other pronunciations. However, the great acoustic distance existing between some of these variants makes their interpretation more plausible as realizations of different mid phonemes, which is the most common view a- mong Chinese researchers ( see, for instance, Wang, 1983; Tian, 1996; Huang & Liao, 2002; Shao, 2007, and Liu, 2015) . According to our data, there are a front unrounded vowel / e / , with two contextual variants [ ] and [ e] ; a back unrounded vowel / ɣ/ , with a more fronted variant [ ə] and a more backed variant [ ɣ] , and a back round vowel / o / . This Siping three-vowel system is, hence, defined by the roundness of segments and by their place of articulation, with height differences mostly dependent on the length of the variants, as determined by their syllabic distribution.

Keywords:

Mid vowels, Siping Chinese, Formants, Duration, Intensity

PDF Download 

 

14 A STUDY ON THE VOWEL PATTERN OF OGELED-MONGOLIAN IN YILI

ZHAO Chunming

Abstract:

The sound patterns of languages, especially those of the ethnic minorities in China have been a hot re- search area in phonology and phonetics. Adopting an experimental approach, the current study explored the vowel patterns of Ogeled-Mongolian mainly spoken in Yili, Xinjiang Uygur Autonomous Region, with a specific focus on their phonological and distributional features. The vowel system of the Ogeled sub-dialect of Yili Mongolian is ba- sically the same as that of the Torghut sub-dialect in Xinjiang. In the initial syllable of the word, there are more short vowels in the Ogeled sub-dialect than in the Torghut sub-dialect, and the vowel / ʉ / in the Ogeled sub-dia- lect is an allophone of the vowel /y/. The long vowels in the initial syllable are/ɐ: /, /æ: /, /e: /, /ø: /, /ɔ: /, / i: / , / y: / , and / o: / in the two dialects. More short vowels can be found in the position of the non-initial sylla- bles of the Ogeled sub-dialect. However, not all short vowels in these positions are reduced vowels which are largely related to their environment and syllable structures. The non-initial syllables in the Ogeled sub-dialect also feature the same long vowels, such as /ɐ: /, /æ: /, /i: /, /o: /, and /y: /. The sound patterns of Ogeled Mon- golian vowels have acute angles and inverted triangles as vertex vowels, with / i, ɐ, o / or / i: , ɐ: , o: / as top vowels. There are more front vowels in the Ogeled sub-dialect while the front and back vowels in Standard Mongo- lian are similar in numbers. The acoustic pattern of Ogeled sub-dialect is characterized with front heavy pattern while the pattern of Standard Mongolian is more of a front-and-back equivalent pattern. The long vowels and short vowels in the initial syllables and non-initial syllables of Ogeled sub-dialect can be divided into six levels: high, sub-high, medium-high, medium, sub-low, and low levels in terms of vowel height. In terms of back-to-front po- sition, they can be divided into three levels: front, central, and back. Taken together, the sound pattern of this sub-dialect can be summarized as a “six-three pattern”. The dispersion of short vowels is found to be not necessa- rily consistent with that of corresponding long vowels. Rounded vowels are aligned to the right of the unrounded counterparts. For high onol sub-high vowels, the diffusion of F1 is stronger than that of F2. The vowel / o / at vari- ous positions and the initial / ɔ / expand from back to front while the initial and non-initial vowels of / o: / change on the basis of height. There are some differences between the distribution model of the vowels of Ogeled dialect and Standard Mongolian vowels. The distribution model may be different for the same vowel in different vowel sys- tems. The low vowels / ɐ / and / ɐ: / expand from front to back, with stronger F2 diffusion than that of F1 while

PDF Download 

 

15 A STUDY ON SPEECH EMOTIONAL PERCEPTION OF CHILDREN WITH AUTISM 

QI Jing, WU Xiyu, YANG Jie, DU Jiamei

Abstract:

Autism Spectrum Disorder ( ASD) is a serious pervasive developmental disorder. People with ASD gener- ally have defects in the interaction of social emotion and communication. The study of the emotional perception of autistic people is important to deepen the understanding of the pathological mechanisms of ASD. In terms of the e- motional perception ability of autistic people, researchers mainly focus on facial expressions. Howerer, speech rhythm abnormalities are one of the key features of autism, and research on the perception of speech emotion of autistic children has important implications for the study and treatment of language disorders in children with au- tism. Research on speech emotion recognition is inadequate, especially in China. Are there any defects in the per- ception of speech emotion in autistic people? Are there any differences between autistic children and typical devel- opment children in emotion recognition patterns? Previous studies have not reached consistent conclusions.

In this study, twenty TD ( Typically Developing) children and twenty ASD ( Autism Spectrum Disorder) chil- dren were matched on age and gender. They served as subjects in a speech emotion identification task for four dif- ferent emotions, including happiness, anger, fear, and sadness. The four kinds of emotional speech were per- formed by the same speaker based on the same expository text. And the identification of speech emotions based on these materials reached a high level of agreement in tests with adults. By comparing the performance of TD chil- dren and ASD children in identifying four emotional sounds through repeated measured ANOVA and T-test, we in- vestigated whether there are differences between the two groups in the ability to recognize speech emotions in terms of identifying trends and confusions. We also conducted a correlation analysis between VB-MAPP ( Verbal Behav- ior Milestones Assessment and Placement Program) scores and perception scores of speech emotion in ASD chil- dren to explore the relationship between their task executian-lelcted ability and speech emotion perception ability.

As a result, we found that the performance of recognition accuracy of ASD children was significantly weaker than that of TD children. However, the trend of emotional confusion and recognition preference of ASD children were basically consistent with that of TD children. Based on the above results and the significant correlations between the VB-MAPP scores and the emotional perception score, it can be concluded that ASD children have no defects in the ability to perceive emotions from speech. Their poor performances are caused by weak task execution-related ability rather than emotional perception defects.

The results of this study support the notion that the deficit in emotional perception is not a core deficit of autism. For children with autism, the scores on multiple items of the VB-MAPP correlated well with scores on the e- motional speech identification task, suggesting that training in some of the basic abilities of children with autism may be useful in improving their performance on emotional speech perception. This also reveals that it is important to pay attention to the design of tasks when conducting experiments with ASD children. Sometimes it is the ability to execute tasks rather than the ability we want to explore that restricts the performance of autistic people.

Keywords:

Autism spectrum disorder, Emotional perception of speech, Speech prosody, Child

PDF Download 

 

16 A STUDY ON PRODUCTION ERRORS IN THE LEARNING OF CHINESE ASPIRATED AND UNASPIRATED CONSONANTS BY RUSSIAN NATIVE SPEAKERS  

HUANG Heting, SHEN Mufen, WANG Zhen, WANG Wei

Abstract :

The aspiration of consonants is one of the main obstacles for second language learners learning Chinese pronunciation. Previous studies focus on Japanese, South Korean, European, and American learners of Chinese. In the native languages of these learners, consonants mainly contrasts in voicing or laxness and tenseness. Howev- er, there are few studies on the acquisition of Chinese aspirated and unaspirated consonants by Russian native speakers (whose native consonants have non - palatalized and palatalized contrasts) . Based on this, this paper wants to answer three questions: (1) whether Russian learners have any special performance in acquiring Manda- rin aspirated and unaspirated consonants compared to speakers of languages with voiced-voiceless (e. g. , English) and lax-tense (e. g. , Korean) contrasts? (2) In terms of production, is it difficult for Russian learners to learn all Mandarin aspirated and unaspirated consonants? (3) Will it be less difficult for Russian to acquire Mandarin aspi- rated and unaspirated consonants with their Chinese proficiency improved?

Based on BLCU-SAIT Chinese interlanguage corpus, we examined a total of 6960 aspirated and unaspirated tokens produced by 20 Russian learners. Taking two years of learning Chinese as the dividing line, we divided the learners into high level and low level groups according to their Chinese proficiency. We calculated the error rates of aspirated and unaspirated consonants in each proficiency level group, and extracted two acoustic parameters of the errors in consonant / t / : the voice onset time and the following vowel's onset segment frequencies of F1, F2, and F3. In order to eliminate the differences between individual speakers, we also converted the onset segment frequencies of F1, F2, and F3 in Hertz to semitone values.

By comparing whether there is a statistically significant difference in the production error rates of aspirated and unaspirated consonants between the two groups of learners and analyzing the two acoustic parameters in the ar- ticulation process, we found that (1) among the six pairs of aspirated and unaspirated consonants, / p / , / k / , / kh / , / tɕ / , / tʂ / are not too difficult to acquire, while the fossilization tends to occur in the acquisition of / t / , / th / , / ph / , / tɕh / , / tʂh / . (2) The error rate of aspirated and unaspirated consonant produced by Russian learn- ers decreased significantly with the improvement of their Chinese proficiency level. (3) There is a significant pal- atalization of the consonant / t / when it is produced with the vowels starting with / i / due to the native spelling and reading rules. We hypothesize that if a consonant similar to a Chinese consonant exists in the learner's native lanlikely to produce this Chinese consonant palatalized before the finals starting with / i / .

Keywords:
Aspirated and unaspirated consonants, Russian native speakers, Aspirated sound learning, Palatalization

PDF Download 

 

17 A BOOK REVIEW ON LANGUAGE IN OUR BRAIN: THE ORIGINS OF A UNIQUELY HUMAN CAPACITY 

Yang Yufang

Angela D. Friederici's book, Language in Our Brain: The Origins of a Uniquely Human Capacity, has been translated into Chinese by Chen Luyao and several other young scholars. The Chinese translation of the book is titled The Brain Origin of Human Language and has been officially published by Science Press. This is an important event for domestic scholars and graduate students engaged in linguistics and language cognitive neuroscience research. Overall, the translation work of this book is excellent. The language is accurate and fluent, making it easier for domestic readers to understand. The book's editing, cover design, graphics, and typography are also praiseworthy. From all aspects, this is an academic book of high value and appearance that will surely be welcomed and praised by readers.

PDF Download