عنوان

Learning speaker-specific characteristics with deep neural architecture

پدید آورنده

Salman, Ahmad

موضوع

Speaker Recognition ; Deep Learning

رده

کتابخانه

مرکز و کتابخانه مطالعات اسلامی به زبان‌های اروپایی

محل استقرار

استان: قم ـ شهر: قم

تماس با کتابخانه : 32910706-025

شماره کتابشناسی ملی

شماره

TLets558060

عنوان و نام پديدآور

عنوان اصلي

Learning speaker-specific characteristics with deep neural architecture

نام عام مواد

[Thesis]

نام نخستين پديدآور

Salman, Ahmad

نام ساير پديدآوران

Chen, Ke

وضعیت نشر و پخش و غیره

نام ناشر، پخش کننده و غيره

University of Manchester

تاریخ نشرو بخش و غیره

2012

یادداشتهای مربوط به پایان نامه ها

جزئيات پايان نامه و نوع درجه آن

Thesis (Ph.D.)

امتياز متن

2012

یادداشتهای مربوط به خلاصه یا چکیده

متن يادداشت

Robust Speaker Recognition (SR) has been a focus of attention for researchers since long. The advancement in speech-aided technologies especially biometrics highlights the necessity of foolproof SR systems. However, the performance of a SR system critically depends on the quality of speech features used to represent the speaker-specific information. This research aims at extracting the speaker-specific information from Mel-frequency Cepstral Coefficients (MFCCs) using deep learning. Speech is a mixture of various information components that include linguistic, speaker-specific and speaker's emotional state information. Feature extraction for each information component is inevitable in different speech-related tasks for robust performance. However, almost all forms of speech representation carry all the information as a whole, which is responsible for the compromised performances by SR systems. Motivated by the complex problem solving ability of deep architectures by learning high-level task-specific information in the data, we propose a novel Deep Neural Architecture (DNA) to extract speaker-specific information (SI) from MFCCs, a popular frequency domain speech signal representation. A two-stage learning strategy is adopted, which is based on unsupervised training for network initialisation followed by regularised contrastive learning. To train our network in the 2nd stage, we devise a contrastive loss function to discriminate the speakers on the basis of their intrinsic statistical patterns, distributed in the representations yielded by our deep network. This is achieved in the contrastive pair-wise comparison of these representations for similar or dissimilar speakers. To improve the generalisation and reduce the interference of environmental effects with the speaker-specific representation, we regulate the contrastive loss with the data reconstruction loss in a multi-objective optimisation. A detailed study has been done to analyse the parametric space in training the proposed deep architecture for optimum performance. Finally we compare the performance of our learned speaker-specific representations with several state-of-the-art techniques in speaker verification and speaker segmentation tasks. It is evident that the representations acquired through learned DNA are invariant and comparatively less sensitive to the text, language and environmental variability.

موضوع (اسم عام یاعبارت اسمی عام)

موضوع مستند نشده

Speaker Recognition ; Deep Learning

نام شخص به منزله سر شناسه - (مسئولیت معنوی درجه اول )

مستند نام اشخاص تاييد نشده

Salman, Ahmad

نام شخص - ( مسئولیت معنوی درجه دوم )

مستند نام اشخاص تاييد نشده

Chen, Ke

شناسه افزوده (تنالگان)

مستند نام تنالگان تاييد نشده

University of Manchester

دسترسی و محل الکترونیکی

نام الکترونيکي

وضعیت انتشار

فرمت انتشار

اطلاعات رکورد کتابشناسی

نوع ماده

[Thesis]

کد کاربرگه

276903

اطلاعات دسترسی رکورد

سطح دسترسي

تكميل شده

عنوان Learning speaker-specific characteristics with deep neural architecture

پدید آورنده Salman, Ahmad

موضوع Speaker Recognition ; Deep Learning

رده

کتابخانه مرکز و کتابخانه مطالعات اسلامی به زبان‌های اروپایی

محل استقرار استان: قم ـ شهر: قم