عنوان

بررسی ویژگی‌صهای مقاوم در بازشناسی گفتار احساسی,‮‭Study of Robust Features in Emotional Speech Recognition‬

پدید آورنده

/ائلناز فروهنده

موضوع

رده

کتابخانه

المكتبة المركزية بجامعة تبريز و مركز التوثيق والنشر

محل استقرار

استان: أذربایجان الشرقیة ـ شهر: تبریز

تماس با کتابخانه : 04133294120-04133294118

‭۱۹۷۳۸پ‬

per

بررسی ویژگی‌صهای مقاوم در بازشناسی گفتار احساسی

‮‭Study of Robust Features in Emotional Speech Recognition‬

/ائلناز فروهنده

: مهندسی برق و کامپیوتر

، ‮‭۱۳۹۵‬

‮‭۱۲۴‬ص‬

چاپی

کارشناسی ارشد

مهندسی برق گرایش مخابرات سیستم

‮‭۱۳۹۵/۱۱/۱۸‬

تبریز

انسان‌ها با تغییرات گسترده‌صای در درک سیگنال‌های گفتاری مواجه می‌صشوند .علی‌صرغم این تغییرات گفتاری، سیستم شنوایی قابلیت تشخیص محتوای سیگنال گفتار خاصی را در مواجهه با مخلوطی از منابع صوتی داراست .این تغییرات در گفتار می‌صتوانند بسته به مشخصات وابسته به گوینده، نویز محیطی، حالت احساسی و غیره روی دهند .با وجود تمام پیشرفت‌صهای اخیر در زمینه‌صی تکنولوژی گفتار، سیستم-های بازشناسی گفتار اغلب با مسائل ناشی از این تغییرات گفتاری برخورد می‌صکنند .این رویداد، از آنجا ناشی می‌صشود که ویژگی‌صهای آکوستیکی و نوایی گفتار تحت تأثیر احساسات واقع شده و چنین تغییراتی موجب ایجاد عدم انطباق بین مدل آموزشی خنثی و گفتار احساسی ورودی خواهند شد .این پدیده، بهره‌صوری سیستم بازشناسی گفتار را به مقدار بسیار زیادی کاهش می‌صدهد .بازشناسی گفتار احساسی‮‭(EASR)‬ ، یکی از زمینه‌صهای جدید در بازشناسی گفتار است که می‌صتواند در تعامل انسان و ماشین نقش مهمی ایفا کند .هرچند در سال‌صهای اخیر شاهد پیشرفت‌صهای بسیاری در تکنولوژی ‮‭EASR‬ بوده‌صایم، با این وجود استخراج ویژگی‌صهای مقاوم همچنان یکی از چالش‌صهای اصلی در آن محسوب می‌صشودص‍ .روش‌صهای گوناگونی برای جبران‌صسازی اثرات احساس در ‮‭EASR‬ مورد استفاده قرار می‌صگیرند که می‌صتوان به صورت زیر بیان کرد ‮‭۱)‬ :بهبود ‮‭EASR‬ در سطح ویژگی ‮‭۲)‬ بهبود ‮‭EASR‬ در سطح مدل آکوستیکی و ‮‭۳)‬ بهبود ‮‭EASR‬ در سطح مدل زبانی .در میان این روش‌صها، بهبودی در سطح ویژگی در یک سیستم ‮‭EASR‬ می‌صتواند برای کاهش نرخ خطای بازشناسی مؤثرتر باشد .این موضوع، مبتنی بر این حقیقت است که مدل‌صهای شنوایی و تکنیک‌صهای مختلف جدیدی وجود دارند که می‌صتوانند برای کاهش اثرات اعوجاجی در محیطصهای واقعی مورد توجه قرار گیرند .در این پایان‌صنامه، ویژگی‌صهای مقاوم در یک سیستم ‮‭EASR‬ مورد مطالعه واقع شده و روش نرمالیزاسیون طول مسیر صوتی ‮‭(VTLN)‬ برای نرمالیزه‌صکردن تأثیر حالت‌صهای احساسی در ویژگی‌صهای آکوستیکی به‌صکار گرفته می‌صشود .در میان ویژگی‌صهای شنوایی متفاوتی که مورد تجزیه و تحلیل واقع می‌صشوند، درمی‌صیابیم که ‮‭PNCC‬ یک ویژگی مقاوم در ‮‭EASR‬ است .روش ‮‭VTLN‬ با به‌صکارگیری پیچش فرکانسی در واحد فیلتربانک، واحد ‮‭DCT‬ و یا ترکیب آن‌صها اعمال می‌صشود .نتایج، حاکی از بهبودی‌صهایی در نرخ بازشناسی گفتار احساسی با به‌صکارگیری پیچش فرکانسی در هریک از این سه حالت است .هرچند آزمایش‌صهای انجام‌صشده با استفاده از روش‌صهای پیچشی متفاوت، آشکار می‌صکنند که پیچش فرکانسی در حوزه‌صی ‮‭DCT‬ بهترین دقت بازشناسی را نتیجه می‌صدهد

Humans are faced with wide variations in the comprehension of speech signals. In spite of these speech variabilities, the auditory system has the capability to realize the content of a specific speech signal in a mixture of sound sources. These variabilities in speech can occur due to speaker-dependent characteristics, environmental noise, emotional state, and so on. Despite all recent advances in speech technology, speech recognition systems struggle often with issues caused by speech variabilities. This is because the acoustic and prosodic features of speech are affected by emotions and such variations cause a mismatch between the training neutral style model and input emotional speech. This phenomenon degrades the performance of the speech recognition system dramatically. Emotion affected speech recognition (EASR) is one of the new fields of speech recognition that could be useful in human-machine interaction. However, the technology of EASR has made many progresses recently in which extracting robust features is one of the main challenges. The different approaches used for compensating the effects of emotion in EASR can be expressed as: 1) the enhancement of EASR in the feature level 2) the enhancement of EASR in the acoustic model level, and 3) the enhancement of EASR in the language model level. Among these methods, the enhancement in the feature level can be more effective for the recognition error rate reduction in an EASR system. This is based on the fact that different state-of-the-art auditory models and techniques exist that can be considered for decreasing the distortion effects in real-world environments. In this thesis, robust features are studied in the EASR system and the vocal tract length normalization (VTLN) method is employed to normalize the effects of emotional states from the acoustic features. Among different auditory features which are analyzed, we find that PNCC is a robust feature in EASR. The VTLN method is applied by employing frequency warping in the filterbank module, the DCT module, and the combination of them. The results show improvements in the recognition rates of emotional speech by employing the frequency warping in all three cases. However, the experiments conducted with different warping procedures reveal that the frequency warping in the DCT domain yields the best recognition accuracies

‮‭Study of Robust Features in Emotional Speech Recognition‬

فروهنده، ائلناز

Forouhande, Elnaz

سیاه و سفید

نمایه‌سازی قبلی

عنوان بررسی ویژگی‌صهای مقاوم در بازشناسی گفتار احساسی,‮‭Study of Robust Features in Emotional Speech Recognition‬

پدید آورنده /ائلناز فروهنده

موضوع

رده

کتابخانه المكتبة المركزية بجامعة تبريز و مركز التوثيق والنشر

محل استقرار استان: أذربایجان الشرقیة ـ شهر: تبریز

عنوان

بررسی ویژگی‌صهای مقاوم در بازشناسی گفتار احساسی,‮‭Study of Robust Features in Emotional Speech Recognition‬

پدید آورنده

/ائلناز فروهنده

کتابخانه

المكتبة المركزية بجامعة تبريز و مركز التوثيق والنشر

محل استقرار

استان: أذربایجان الشرقیة ـ شهر: تبریز