Towards a robust Arabic speech recognition system based on reservoir computing
[Thesis]
Alalshekmubarak, Abdulrahman
Smith, Leslie; Graham, Bruce
University of Stirling
2014
Thesis (Ph.D.)
2014
In this thesis we investigate the potential of developing a speech recognition system based on a recently introduced artificial neural network (ANN) technique, namely Reservoir Computing (RC). This technique has, in theory, a higher capability for modelling dynamic behaviour compared to feed-forward ANNs due to the recurrent connections between the nodes in the reservoir layer, which serves as a memory. We conduct this study on the Arabic language, (one of the most spoken languages in the world and the official language in 26 countries), because there is a serious gap in the literature on speech recognition systems for Arabic, making the potential impact high. The investigation covers a variety of tasks, including the implementation of the first reservoir-based Arabic speech recognition system. In addition, a thorough evaluation of the developed system is conducted including several comparisons to other state- of-the-art models found in the literature, and baseline models. The impact of feature extraction methods are studied in this work, and a new biologically inspired feature extraction technique, namely the Auditory Nerve feature, is applied to the speech recognition domain. Comparing different feature extraction methods requires access to the original recorded sound, which is not possible in the only publicly accessible Arabic corpus. We have developed the largest public Arabic corpus for isolated words, which contains roughly 10,000 samples. Our investigation has led us to develop two novel approaches based on reservoir computing, ESNSVMs (Echo State Networks with Support Vector Machines) and ESNEKMs (Echo State Networks with Extreme Kernel Machines). These aim to improve the performance of the conventional RC approach by proposing different readout architectures. These two approaches have been compared to the conventional RC approach and other state-of-the- art systems. Finally, these developed approaches have been evaluated on the presence of different types and levels of noise to examine their resilience to noise, which is crucial for real world applications.
Reservoir computing ; Speech recognition ; Speech corpus ; Arabic language ; Arabic languages ; Computational linguistics ; Translators (Computer programs) ; Speech perception